The Data Science Team has created a Deep Carbon Virtual Observatory (DCvO), an open-access, searchable network of information that they hope will serve as a model for other open science initiatives.
Xiaogang Ma (formerly at Rensselaer Polytechnic Institute, now University of Idaho, USA), and other members of the DCO Data Science Team, including Patrick West, Stephan Zednik, John Erickson, Ahmed Eleish, Yu Chen, Han Wang, Hao Zhong, and Peter Fox (all currently or formerly at the Tetherless World Constellation at Rensselaer Polytechnic Institute, USA) have applied the latest information technologies and online resources to develop a web portal for the more than 1,000 researchers in the DCO community. The portal provides open-access information to data sets, sample collections, field sites, publications, and instruments in a way that links the various components in a network, promoting collaboration and spurring new ideas in deep carbon science. The researchers describe the creation of this network in a new paper in the journal Frontiers in Earth Science .
To develop the data portal, the group used a new technology called the Semantic Web to link data and provide better searches that yield more information. “Conventionally we think each website as a web of documents, but there’s a second level,” said Ma. “We want to make it a data portal so we can publish data sets and their metadata with a structure, especially a machine-readable structure.” The new techniques involve structuring and tagging information, resources, and web pages in such a way that they can be read directly by computers.
“It can be a little tedious for new users,” cautions Ma, because adding new entries, such as a recent paper, requires the input of additional details, including creating a record for each co-author and tagging the journal. But with additional entries, the process becomes easier. “The process is like weaving a knowledge network for all the information of deep carbon science,” said Ma. As the network increases in size, not only does data input becomes easier, but it will also facilitate the development of innovative data science functions.
Once the team put the structure in place, they built a user-friendly front end with a search engine to access the information. All their code is open-source and incorporates state-of-the-art data management platforms, which can then be shared with other research groups interested in forming their own online communities for networking and open data sharing.
“I think our paper shows a really good case study of open data and open science and presents a platform to support these efforts in a collaborative program like the DCO,” said Ma.
Besides new data sets, the portal also contains archived data sets from previous studies. The team worked with Mark Ghiorso (OFM Research, USA), an Extreme Physics and Chemistry Community member, to establish a system for rescuing legacy data sets from older publications that cannot be read by machines because they exist as scanned images. They extracted, reviewed and entered the data sets into the DCO dataset browser, where researchers worldwide can access them.
The group envisions the portal as a lasting resource, open to anyone interested in deep carbon science. Data scientists can work with researchers to draw new conclusions from the accumulated data. Ideally, the portal will enable deep carbon scientists to continue making discoveries long after the DCO culminates in 2019.