Over the past century, an enormous amount of data has been produced, archived and published across the geoscience community. Development of experimental devices, analytical tools, as well as scientific methods have been the driving forces underneath the accelerated increase in quantity and improvement in quality of geoscience-related data. In recent decades, such exponential growth of data has uncovered new, data-intensive approaches towards research questions that were once before unsolvable in absence of enough data and even encouraged the pursuit of many new discoveries.
Data science has, therefore, become more instrumental than ever in geoscience research. While numerous high-quality and comprehensive data sources are made available for geoscientists, such as EarthChem, Mindat.org, Visualization and Analysis of Microbial Population Structures (VAMPS), PetDB, etc., many challenges still lie in areas such as dark data acquisition, integration, quality management, processing, analytics and so on. Understanding and adopting suitable data science practices throughout the data life cycle cannot be emphasized enough in order to maximize the utility of existing data and unleash the full potential of data-driven discovery in geoscience.
This webinar will kick off our four-part webinar series on data science in the geosciences, presented by the DCO Data Science Team at Tetherless World Constellation, Rensselaer Polytechnic Institute. The whole series will cover the data life cycle in the order of data acquisition, data processing, and data analysis. For this episode we will discuss general data acquisition in geoscience, featuring a recent example of legacy data rescue and management by the Data Science Team. Demonstrations will be given in optical character recognition and spreadsheet-processing software as well as Jupyter notebooks for running R statistical language and Python.