Researchers affiliated with the Deep-time Data Driven Discovery project (4D) are interested in recycling – but not plastic and paper. They want to reuse data from decades of research on Earth, its minerals, life forms, and history to make new discoveries using data science tools.
By analyzing and visualizing existing data in new ways, 4D scientists aim to understand the interactions between life and the physical world, make predictions about Earth’s evolution, and uncover laws of planetary formation. Several DCO members are actively involved in the coalition, including DCO Executive Director Robert Hazen, Shaunna Morrison, Chao Liu (all at Carnegie Institution for Science, USA), Peter Fox (Rensselaer Polytechnic Institute, USA), Donato Giovannelli (University of Naples Federico II, Italy), Sabin Zahirovic, Dietmar Müller (University of Sydney, Australia), and others.
Inspired by DCO work on carbon mineral evolution, The 4D initiative grew out of the Deep-Time, Data Infrastructure project, sponsored by the Keck and Alfred P. Sloan Foundations, and the DCO, and the 4D Workshop: Deep-time Data Driven Discovery and the Evolution of Earth, held in June 2018. The workshop brought together Earth, space, life, and data scientists to integrate their fields through a data-driven perspective. Members of 4D are still looking for long-term funding to advance the research collaborations developed at the launch meeting.
By drawing on data science analysis and visualization techniques like machine learning and network analysis, researchers in the Earth and life sciences can make better use of their data. Traditional two-dimensional graphs are easy for the human brain to understand, but they often leave out a lot of information. “Visualizations, like networks, allow you to display hundreds of dimensions at once and overlay multiple different parameters so you can get a holistic picture of what’s happening in a system,” said Morrison. “Data science is a way to not throw anything away. It’s an opportunity to make data more accessible.”
Among the multiple projects that are in the works, Liu and Simone Runyon
(University of Wyoming, USA) are looking at geochemical and mineral data from the supercontinent Rodinia, to explain why it was such an oddball compared to other supercontinents. And Morrison and Hazen are performing cluster analysis using the minerals in the dust clouds of stars to see what they reveal about the star and the evolution of its planets.
Giovannelli and colleagues are looking for relationships between the microbial genomes and environmental conditions at different locations, to find ways the geochemical environment is influencing microbial diversity and vice versa. The researchers also plan to look backward in time to see if the changing availability of different metals, which are key parts of many proteins, has impacted protein function throughout Earth’s history.
“Looking at the interactions between geochemistry and microbial populations is not new in microbial ecology,” said Giovannelli, “ and so perhaps our biggest contribution comes from the scale. We are leveraging publicly available datasets, both in the geo and life sciences, and linking them back to the geological settings.”
Giovannelli and his colleagues will be using data from Biology Meets Subduction, a multidisciplinary DCO investigation into a Costa Rican subduction zone, to understand connections between biology, volcanic systems, and element cycling. “The Biology Meets Subduction project provides a great dataset to look for geo-bio correlations, given the high volume of microbiological, geochemical, and mineralogical data collected synoptically. We are finding all sorts of new, exciting results,” he said, “and we are testing new ways to visualize and analyze the data.”
Morrison is using a machine learning technique, similar to the algorithm Amazon uses to recommend additional purchases, to analyze mineral occurrence data worldwide. Currently the researchers don’t have a computer powerful enough to crunch their entire data set. “We crashed a supercomputer trying to do it,” said Morrison. But with the data from mindat.org, a database of minerals, rocks and associated information, she performed analyses of pairs of minerals, which predicted that wulfenite, a colorful mineral with yellow to red-orange crystals, should occur near Cookes Peak in New Mexico. Following the prediction, Erin Delventhal, a member of the mindat.org management team, went out and found a sample of the mineral at that location.
“I’m most excited about making my field, mineralogy and crystallography, predictive,” said Morrison. In the past, mineralogists have primarily described the minerals they’ve found. “We’re moving the field into the future, which is really cool.”
Main image: A collector found this sample of yellow wulfenite crystals because an algorithm, similar to the one that Amazon uses to recommend additional products, predicted that the mineral should occur in the Cookes Peak region in New Mexico. Credit: Jerry Cone/mindat.org