On 5 June 2014, 40 DCO scientists met at Rensselaer Polytechnic Institute, Troy, NY, USA, to discuss the role of Data Science in the DCO community. The primary goal of Data Science Day was to provide an opportunity for in depth discussions of data management and data science activities across a broad range of scientific disciplines. All four DCO Science Communities were represented, allowing for the technical challenges and specific big data needs of each to be thoroughly explored.
The day began with a series of invited talks. Fran Berman (Rensselaer Polytechnic Institute/Research Data Alliance) began by outlining the technical and social infrastructure necessary for effective data-driven discovery with an emphasis on sustainable data stewardship and availability. Mark Ghiorso (OFM Research Inc.) then spoke about how innovations in database curation will specifically benefit the geoscience community through the generation of flexible models that can adapt quickly to account for new data or respond to researcher needs. Kerstin Lehnert (Lamont Doherty Earth Observatory, Columbia University), who has been intimately involved in developing the International GeoSample Number (IGSN) system for sample databases, addressed many of the challenges that face the Earth science community, including adequate funding and the integration of existing sample databases. Bruce Watson (Rensselaer Polytechnic Institute) focused on his experience using data to make discoveries in deep carbon science.
DCO’s Executive Director, Robert Hazen (Carnegie Institution of Washington), who has long championed the importance of data-drive discovery and abduction, focused his talk on the challenges of studying deep carbon through deep time. With so many different datasets to consider, from mineralogy and petrology data to proteomics data to geochemistry data and modeling, DCO needs coherent and open data storage and analysis capabilities.
After lunch, each Community split into breakout sessions to discuss their particular needs and concerns. A common feature of these sessions was basic web-based infrastructure, something the DCO Data Science Team has been working on for some time. Indeed, the Data Science Team offered an optional workshop on the morning of 6 June, in which many DCO scientists participated. At this workshop the scientists were introduced to the online DCO Data Portal and its current capabilities. The Data Science Team encouraged feedback, keen to ensure the Data Portal meets the varied needs of the DCO community.
In his concluding remarks, DCO Data Science Team Leader Peter Fox spoke of data as a “first class citizen,” not an afterthought. He also noted the central importance of data to DCO’s success, and suggested future data-focused events and sessions.
Ultimately, data are one of the lasting legacies of the Deep Carbon Observatory. With such expertise and enthusiasm for sustainable data management already widespread within the community, it will be a legacy that will benefit many scientific disciplines for years to come.