Dr Jens Klump1, Mr Doug Fils2, Mr Jess Robertson4, Dr Anusuriya Devaraju3, Dr Adam Leadbetter5
1CSIRO, Kensington, Australia, 2Ocean Leadership, Washington, USA, 3University of Bremen, Bremen, Germany, 4Ministry of Business, Innovation and Employment, Wellington, New Zealand, 5Marine Institute, Oranmore, Ireland
Automation in data curation is going to make large volumes of data available. This increase in volume will also bring more variety in metadata. How can we best address the challenge of scaling metadata up to giga-scale while at the same time accommodating more variety?
The technologies to syndicate metadata and repository catalogues were developed alongside with the emergence of the internet and present-day mechanism used for the dissemination of metadata in research data infrastructures are based on harvesting catalogues formatted in dialects of Extensible Markup Language (XML).
JSON-LD representation of metadata that incorporates formal vocabularies allows machines to understand semantic descriptions of the metadata and thus gives access to the semantic web and ways to encode the context around data. This makes building a multi-domain network far easier and a web architecture exercise. In addition, the use of web architecture approaches means third parties like Google, Bing, DataOne and others are free to access, use and provide offerings based on the open, well-known architecture.
This presentation will report on work done in a network of activities to make metadata available on giga-scale and experiments to test the supporting system architecture and gauge its operational costs in a cloud-native implementation.
Jens Klump is a geochemist by training and leads the Geoscience Analytics Team in CSIRO Mineral Resources based in Perth, Western Australia. In his work on data infrastructures, Jens covers the entire chain of digital value creation from data acquisition to data analysis with a focus on data in minerals exploration. This includes automated data and metadata capture, sensor data integration, both in the field and in the laboratory, data processing workflows, and data provenance, but also data analysis by statistical methods, machine learning and numerical modelling.