Mr Alex Ip1, Mr Andrew Turner1, Dr Yvette Poudjom-Djomani1, Dr Ross Brodie1, Mr Phillip Wynne1, Dr Kelsey Druken2, Dr Neil Symington1, Dr Carina Kemp3
1Geoscience Australia, Symonston, Australia, 2National Computational Infrastructure (Australia), Acton, Australia, 3AARNet , Yarralumla, Australia
Large geophysical data has traditionally been difficult to manage in a consistent, open, and efficient manner. The demands of modern, large-scale computing techniques, coupled with the need for sound data and metadata management, mean that established data formats and access methods are no longer adequate.
Geoscience Australia (GA) has been working with its partners to leverage and extend existing data standards to represent various geophysical data in modern scientific container formats including netCDF & HDF. The new data encodings support rapid and efficient data subsetting, either directly from a file or remotely via web services. These will underpin GA’s future data delivery pipelines for Australian government-funded geophysical data.
NetCDF efficiently handles multi-variate raster, line, and point data, as well as n-dimensional data structures supporting more demanding applications such as AEM and airborne gravity data. Structural and metadata standards deliver interoperability, and existing and emerging data types are supported without loss of precision or other information.
This presentation will cover:
- The rationale for Modernising GA’s geophysical data holdings into modern open standard container formats
- An outline of the netCDF4 file format and associated tools, and some of the benefits they provide
- The open-source tools and methodology used to translate grid, line, point and other data into netCDF4, and to perform metadata synchronisation
- Live use cases exploiting web services
Andrew is a Data Engineer working in the High Performance Data team at Geoscience Australia. He has a Bachelor of Science at the Australian National University and a Masters of Technology at the University of Canberra. His work focuses on building software tools and data pipelines to make data more Findable, Accessible, Interoperable and Reusable (FAIR).