Challenges in combining heterogeneous materials repositories and datasets into a homogeneous database

Dr Melisande Julia Fischer1, Dr Amanda Barnard1

1Csiro Data61, Docklands, Australia


In material science there are a lot of different perspectives and information on specific materials. There are computational or experimental measured properties as well as experimental or simulated spectra and graphs, and extensive metadata on the software or conditions for the experiments or the computational methods, code and parameters.  All this information is stored in separated repositories or datasets across the world, making it challenging to access, combine and use the data. Each uses a different storage system, programming language and unique identifiers for each data point, which may exist in multiple places under different schemes. For a specific group of materials, namely perovskites, we have identified and merged some of these freely availed repositories and stored the data into one database, converting this heterogeneous information into a homogeneous resource. The resulting database, based in JSON, is now flexible and suitable for comprehensive analysis and machine learning.


Julia Melisande Fischer is part of the Applied Machine Learning group for the Commonwealth Scientific and Industrial Research Organisation (CSIRO) at Data61.

She completed her Bachelor and Master of Science in Chemistry at Ulm University, Germany. Afterward, she received an International Postgraduate Scholarship from the University of Queensland (UQ). From the Australian Institute for Bioengineering and Nanotechnology (AIBN) at UQ, she graduated with a Doctor of Philosophy in Chemistry in 2018. Her research is combining sustainable energy applications, material science, and data science.



