Fang Huang, Rensselaer Polytechnic Institute, USA
Owing to the development of experimental and analytical equipment and methods, large amounts of data are being produced in labs all over the world. In the last a few decades, a number of high-quality and comprehensive data resources became available for geoscientists, including but not limited to EarthChem (geochemistry), mindat.org (mineralogy), Visualization and Analysis of Microbial Population Structures (VAMPS) (geobiology) and PetDB (petrology). The rapidly increasing volume and variety of geoscience-related data give researchers opportunities to answer scientific questions which are unsolvable using traditional methods. And that is where big data analytic techniques come into play.
It is often said that 80% of data analysis time is spent cleaning and preparing the data. Moreover, data cleaning is not a one-time job – it is an ever-present need while performing data analysis.
In this webinar, we will mainly focus on data processing. We will start by introducing rules that define a tidy dataset. Bearing these rules in mind, I will show how to use relatively simple python codes to deal with geoscience data with some visualization. The last part of the webinar will highlight an ongoing project on mineral inclusions in diamonds. The webinar should be of interest to any researchers working on data science-related projects.