DCO Data Science Team Takes Steps to Improve Data Curation and Research Reproducibility

The Deep Carbon Observatory Data Science Team, based at Rensselaer Polytechnic Institute, recently won a grant from the Research Data Alliance (RDA) US office to explore adoption of two RDA Recommendations.

These grants support teams in adopting technologies and best practices, as articulated in RDA Recommendations, and then in capturing the lessons from that adoption.The DCO Data Science Team proposed incorporating the Scalable Dynamic Data Citation (DDC) and “Scholix” Recommendations into the DCO Data Portal infrastructure, with the aim of making DCO data and publications more widely findable, available, and interconnected as part of a larger scholarly network.

The Research Data Alliance is an international organization that aims to remove barriers to data sharing by inviting both data users and providers in any field to solve data sharing issues as a team. Working Groups develop RDA Recommendations and strive to develop viable solutions to well-articulated, specific data-sharing problems. After community vetting and endorsement, the Recommendations are available for adoption by others in similar communities inside or outside of RDA. More information on RDA is available here.

The RDA DDC being integrated into DCO’s portal provides a method for persistently referencing a specific subset of dynamically changing data by using persistent identifiers. This allows researchers and other data users to precisely link to data used in a study or in a particular provenance chain (a record of all actions done to the data set). This precise, immutable reference increases the reproducibility and validity of the resulting work.

In a similar vein, the RDA Scholix Recommendation provides a high-level framework for exchanging links and basic metadata between scholarly literature and data. The goal is to enable a better understanding of what data underpin the literature.

The Data Science Team‘s goal for this adoption project is to deploy the RDA DDC in the DCO Data Portal to facilitate precise reference to data held at the Rensselaer Polytechnic Institute (RPI) Tetherless World Constellation and elsewhere within the DCO data legacies. These and other related links between data and literature in the Scholix framework will then be shared with the wider community. Together, if broadly adopted, the DDC and Scholix Recommendations could significantly improve the traceability of data use and reuse, and advance immediate researcher needs ranging from the reproducibility of the research, to credit mechanisms for data providers and curators.

The Data Science Team will also explore improving referencing and connection for the four DCO data legacies developed in collaboration with DCO’s four science communities:

  • DL1: Super-sized Sample Inventory, Inventory of “Deep Carbon” Instrumentation, Inventory of “Deep Carbon” Field Sites
  • DL2: Census of Deep Microbial Life (CoDL), Thermodynamic Parameters for High-Pressure, High-Temperature Physics and Chemistry Modeling, Global map of Carbonate Lithologies of Earth
  • DL3: Global Earth Minerals Inventory, Global Abiotic Fluid Distribution, Inventory of Dynamics and Physics of Deep Fluids, Inventory of Geochemical Models, Geo Sample Curation
  • DL4: Inventory of Diamonds with Inclusions + Derived products, Carbon Cycle, Flux of Volcanic Systems (magmatic, ...), State of High P and T Carbon and Related Materials

The results of this project will greatly increase the recognition and reuse of these legacy products, and help connect DCO-related data and literature with other disciplines through the various Scholix aggregators.

The DCO Data Portal acts as a central hub for access to DCO data and other research products. it is, therefore, a logical adopter of these recommendations, but it will require negotiation and mediation with our various partner repositories and data providers. The Data Science team looks forward to extending their collaboration with DCO partners and the Data Science Advisory Committee. We welcome your feedback (contact: Mark Parsons).

DCO Data Science Team members Peter Fox, Mark Parsons, Ahmed Eleish (PhD student), Brenda Norton Thomson (PhD Student), Kathy Fontaine, and John Erickson (all at RPI) are currently participating in this effort. The team anticipates sharing their initial results at the 11th RDA Plenary in Berlin, Germany in late March 2018, and final results during the 12th RDA Plenary at International Data Week in Gaborone, Botswana in November 2018 as well as to the entire DCO Science Network.

Further Reading

IGSN diamonds
DCO Highlights International Initiative to Make Sample Registration Easy and Open on a Global Scale

A new grant from the Alfred P. Sloan Foundation will support an international initiative, led by…

4D Workshop Report
DCO Highlights 4D Workshop: Deep-time Data Driven Discovery and the Evolution of Earth

The 4D Workshop was convened from June 4-6, 2018 to explore ways to advance our understanding of…

Webinar Wednesday 13 June 2018
DCO Upcoming Events Webinar 13 June 2018: Why and How to Cite Data

Learn how the Data Science team is using new technologies to increase the visibility, validity, and…

DCO Highlights DCO Webinar Wednesdays Summer Data Science Series

In this four-part series, members of DCO’s Data Science Team will walk through best practices for…

Back to top