Using Web Architectures for Gigascale Metadata Syndication

Dr Jens Klump1, Mr Doug Fils2, Mr Jess Robertson4, Dr Anusuriya Devaraju3, Dr Adam Leadbetter5

1CSIRO, Kensington, Australia, 2Ocean Leadership, Washington, USA, 3University of Bremen, Bremen, Germany, 4Ministry of Business, Innovation and Employment, Wellington, New Zealand, 5Marine Institute, Oranmore, Ireland


Automation in data curation is going to make large volumes of data available. This increase in volume will also bring more variety in metadata. How can we best address the challenge of scaling metadata up to giga-scale while at the same time accommodating more variety?

The technologies to syndicate metadata and repository catalogues were developed alongside with the emergence of the internet and present-day mechanism used for the dissemination of metadata in research data infrastructures are based on harvesting catalogues formatted in dialects of Extensible Markup Language (XML).

Indexing the Internet at large led to the development of more lightweight encodings based on JavaScript Object Notation for Linked Data (JSON-LD). Commercial search engine operators introduced as a lightweight structured data format for metadata syndication, which has now become the basis of services like Google Data Search.

JSON-LD representation of metadata that incorporates formal vocabularies allows machines to understand semantic descriptions of the metadata and thus gives access to the semantic web and ways to encode the context around data. This makes building a multi-domain network far easier and a web architecture exercise. In addition, the use of web architecture approaches means third parties like Google, Bing, DataOne and others are free to access, use and provide offerings based on the open, well-known architecture.

This presentation will report on work done in a network of activities to make metadata available on giga-scale and experiments to test the supporting system architecture and gauge its operational costs in a cloud-native implementation.


Jens Klump is a geochemist by training and leads the Geoscience Analytics Team in CSIRO Mineral Resources based in Perth, Western Australia. In his work on data infrastructures, Jens covers the entire chain of digital value creation from data acquisition to data analysis with a focus on data in minerals exploration. This includes automated data and metadata capture, sensor data integration, both in the field and in the laboratory, data processing workflows, and data provenance, but also data analysis by statistical methods, machine learning and numerical modelling.



AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.