Cleaning Up Dirty Datasets from the Deep

Subsurface environments present many challenges to researchers who sample microbial DNA to identify the local microbes. The low numbers of microbial cells in many subsurface environments make these samples especially susceptible to even low levels of contamination. Extraneous DNA sequences can come from drilling fluids during sampling, from kits used to isolate and sequence the DNA in the lab, and even from microbes on the researchers’ skin. But with proper screening techniques, scientists can safeguard their data sets from these unwanted sequences.

Cleaning Up Dirty Datasets from the Deep

A group of DCO Deep Life Community members has developed guidelines to reduce contamination while collecting and sequencing subsurface samples and to remove interloping sequences. Cody Sheik (University of Minnesota Duluth, USA), Brandi Kiel Reese (Texas A&M University, USA), Katrina Twing (University of Utah, USA), Jason Sylvan (Texas A&M University, USA), Sharon Grim (University of Michigan, USA), Matthew Schrenk (Michigan State University, USA), Mitchell Sogin (Marine Biological Laboratory, USA), and Frederick Colwell (Oregon State University, USA) published these guidelines in a new paper in Frontiers in Microbiology [1]. Through these screening methods, the researchers discovered that more than one quarter of the sequences in the Census of Deep Life database are potential contaminants, some of which have the potential to alter how scientists interpret microbial carbon cycling in the subsurface.

The new study grew out of the spring 2016 Synthesis Deep Life workshop in Redondo Beach, California. Attendees were attempting to compile tens of millions of subsurface DNA sequences researchers had submitted to the Census of Deep Life database into a single data set. “We realized that before we could synthesize the data, we needed to go through some quality control,” said Kiel Reese. “We thought that if we had these questions then other people would have the same questions about how to properly check the data for quality and remove contaminants.”

The researchers drew from previous studies of contamination and existing sequence analysis tools to develop a list of guidelines and demonstrated some protocols using the Census of Deep Life database. The DCO Deep Life Community funded several sequencing projects that contributed to this central repository of subsurface microbial sequences, hosted through the Marine Biological Laboratory VAMPS website. 

Reducing the unwanted sequences in a data set starts with good experimental design that includes sequencing of “blank” samples (e.g. testing microbial contaminants in unused drilling fluid) collected throughout the experiment to track sources of contamination. Common contaminants, many which came from supposedly sterile reagents in lab kits, accounted for 27% of the Census of Deep Life database. “Seeing that kits are significantly adding to the contamination in some samples was pretty striking,” said Sheik. 

After sequencing, researchers can choose from several tools that identify potential contaminants by comparing sequences from samples and blanks. There is no single tool that works perfectly, so using multiple tools can remove likely contaminants while also preventing a researcher from tossing legitimate sequences that happen to be related to common contaminants. “We’ve always said, if you see Propionibacterium, it’s a skin contaminant and remove it,” said Kiel Reese. “It turns out that some of them could potentially be subsurface microbes we should investigate further.” 

Some of the samples also contained archaeal contaminants, including methanogens, though most experiments do not include checks for archaeal contamination. Methanogenesis is an important type of subsurface metabolism, and so these contaminants could potentially skew interpretations of carbon cycling. 

These new guidelines can improve future data sets and maintain the legacy of the Census of Deep Life as a high quality data resource. A fully synthesized and quality controlled data set from the Census of Deep Life will soon be published by another group of researchers. 

Expeditions to explore subsurface environments, such as those undertaken by the International Ocean Discovery Program and the International Continental Scientific Drilling Program represent multi-million dollar investments. The researchers hope that their guidelines will help protect these investments, as well as future investigations into remote locations on Earth and even on other planetary bodies. “Once you go to these places that no one has ever been before, you want to make sure you’re gathering the best quality data,” said Sheik. 

Microbial mat and sequences
While identifying microbes from subsurface environments, like these mats from the Soudan Iron Mine in northern Minnesota, scientists must take great care to minimize contamination, not only at the time of collection, but also during DNA extraction and sequencing. Credit: Josh Knackert, University of Wisconsin

Further Reading

DCO Research Amino Acid Metabolism Fuels Fracking Communities

A small and interconnected microbial community lives in deep shales. The microbes persist in the…

DCO Research Cool Hydrothermal Vent Fluids Fuel Sediment Microbes

Lukewarm fluids from cool hydrothermal vent systems supply nitrate and oxygen to microbial…

Closeup of core
DCO Research Microbes Responsible for Massive Methane Deposit in Submarine Mud Volcano

An international collaboration of scientists explores the interactions between the deep biosphere…

DCO Research Methanogens Can’t Make It On Their Own in the Lost City

DCO members Susan Lang (University of South Carolina, USA) Gretchen Früh-Green, Stefano Bernasconi…

Back to top