Use of high performance computing and high performance data formats in magnetotelluric data processing and inversion

Alison Kirkby1

1Geoscience Australia, Canberra ACT

Geoscience Australia (GA) has had an ongoing magnetotelluric (MT) program for over a decade. The software available for processing, analysis and inversion of MT data have evolved significantly over this time, partly to make use of increased availability of high performance computing facilities.

GA is a major contributor to the Australian Lithospheric Architecture Magnetotelluric Program (AusLAMP), which aims to collect long period MT data on a 0.5 degree grid across the Australian continent. Data and resistivity models from this program will image the Australian lithospheric conductivity from depths of 10 to 100 km. Given the continental scale and large number of stations being collected, inverting these data to obtain resistivity models of the Australian lithosphere relies on the use of specialised software on high performance computing facilities.

GA is also working to increase accessibility and useability of raw (time series) MT data. To date, large data volumes and inconsistent formats have limited the accessibility of this data. This has meant that it has been difficult to record a transparent workflow from raw to processed data, and then from processed data to modelling and inversion products. GA is working to make use of high performance data formats to facilitate the accessibility and visualisation of this data.

This presentation will detail some of the MT work being done by GA and how the availability of high performance computing facilities is helping to increase the impact and use of MT as a key dataset for understanding the Australian lithosphere.


Biography:

Dr Alison Kirkby came to Geoscience Australia as a graduate in 2008, and then joined the Geothermal Section. She commenced a PhD in geophysics at the University of Adelaide in 2013, which she completed in 2016 and has since worked in the Magnetotelluric team doing data processing, modelling and interpretation.

Bridging science delivery from on premises to cloud

Benjamin Vanzino1

1Geoscience Australia

 

Maintaining relevance and improving time to market for science data products has been a long standing issue that requires a complete understanding of the science and Information Technology workflows involved. The goal is to standardize and automate delivery pipelines for these science data products.  We cover current working platforms and delve into the data management space with the use of conventions and defined schema through to how that translates to automated systems for creation and delivery for Geoscience Australia products. Data management is the key to standardisation and automation for the real time, near real time and scheduled update of products. Our move to cloud based systems has been dramatically impactful to reduce time to market, improve up time and resilience of systems, improve project agility to acquire resources on demand, and to increase value of existing systems. Taking an enterprise approach to ensure scalability of systems has helped manage the organic growth of systems in attempting to bridge the gap from traditional IT to modern cloud based delivery platforms.


Biography: To be confirmed

HiperSeis: Supercharging Seismic Workflows on High Performance Computing Platforms

Dr Rakib Hassan1, Dr Babak Hejrani1, Dr Alexei Gorbatov1, Dr Fei Zhang1

1Geoscience Australia, Symonston, Australia

 

Geoscience Australia (GA) maintains a collection of permanent seismic stations scattered around continental Australia. GA also deploys temporal arrays of seismic stations, progressively spanning the entire continent, which acquire data at greater spatial resolutions, but over shorter time periods. In addition, GA has access to historical temporal deployments carried out by partner institutions.

Much of these data are stored on traditional file systems in legacy formats and are not amenable to data- and compute-intensive seismic workflows, e.g. detection of earthquake phase arrivals for generating body wave tomography and computation of cross-correlations for generating ambient noise tomography.

HiperSeis is a collection of software programs, developed to convert and process seismic waveform data. It comprises scripts for converting seismic waveform data into the Adaptable Seismic Data Format (ASDF), amenable to highly scalable parallel file-systems, e.g. the Lustre file-system available at the National Computational Infrastructure (NCI). It also contains parallelized modules for detecting earthquake phase arrivals and computing cross-correlations between waveform data from a pair of seismic stations.

Current results from parallel earthquake phase arrival detection, run on 336 cores, over more than 20 TB of combined waveform data, suggest a speed up by a factor of ~100. An exercise that would have otherwise taken in the order of three months can now be completed overnight. We expect similar, potentially improved, speedups for the more computationally intensive cross-correlation workflow. The short turnaround times of these workflows facilitate experimentation with enhanced algorithms for seismic data analysis.


Biography:

to be advised

Resolving the seismic velocity structure of the Australian lithosphere

Marcus Haynes1, Alexei Gorbatov1, Babak Hejrani1, Rakib Hassan1, Fei Zhang1

1Geoscience Australia, Canberra, ACT

 

Tomographic inversion of seismic data enable geophysical imaging of otherwise-inaccessible regions of Earths lithosphere and mantle. Such models can provide important constrains on structure and composition with depth. For instance, the relationship between regional seismic wave-speeds and tectonics has been known for a long time. However, more recently it has been recognised that lithospheric structure can also allow spatial inferences to be made about the systems responsible for generating economic mineral deposits. Given this, high-resolution 3D seismic imaging of the Australian lithosphere has been identified as a high-priority for improving mineral exploration.

Seismic velocities in the lithosphere can be inferred by the relative travel-time between earthquakes and seismic stations. Australia’s intra-continental setting records relatively few earthquakes in comparison to the subducting oceanic slabs to the north and east, and hence passive-seismic imaging of the lithosphere requires the deployment of dense arrays of seismometers. This leads to heterogeneous data coverage across the continent and, as such, the coarseness with which we can infer lithospheric seismic velocities varies spatially.

Model resolution analysis characterises the degree to which individual model parameters can be independently predicted. We use the results of resolution analysis to directly guide the construction of an irregular grid mesh across our model domain. This effectively alters the regularisation of our inversions and allows the 3D seismic velocity structure to be inferred across a range of spatial scales corresponding to the amount of information available.


Biography:

Marcus joined Geoscience Australia in 2007 as a cadet, and has worked across the agency in various roles. He currently works as a geophysicist in the Mineral Potential section. Marcus’ role involves the geophysical imaging of the lithosphere for mineral system assessments. Marcus is also concurrently completing a PhD at the Australian National University and is in the final stages of writing up his thesis which examines the conductive flow of heat through the Australian continental crust.

Fully Homomorphic Encryption and k-Nearest Neighbour Classification

Kiowa Scott-Hurley1, Chris Watkins1

1CSIRO, Clayton South, VIC

 

The ability to perform encrypted computation on encrypted data enables a range of cloud and edge based computing solutions to be applied to sensitive data, either at scale or closer to the data source. We implemented a k-nearest neighbours classifier in a ho-momorphically encrypted space using the Microsoft SEAL library. The scheme imagines a user and cloud scenario, in which multiple users cooperatively train a classifier on their combined encrypted data without sharing the data with one another, to motivate the use of encrypted computation. We demonstrate near linear performance results on large datasets (16,000 points) across a range of model parameters. This implementation illustrates that fully homomorphically encrypted machine learning is no longer prohibitively slow, and opens a pathway to encrypt other machine learning techniques in the future.


Biography:

Kiowa Scott-Hurley is a cadet with the Scientific Computing team at CSIRO. A student of philosophy and pure mathematics Kiowa has been applying her high level reasoning and complex problem solving skills to challenges in modern post quantum safe cryptography and machine learning

Corrfunc: Blazing Fast Correlation Functions on the CPU

Manodeep Sinha1

1Centre for Astrophysics & Supercomputing, Swinburne University Of Technology, Hawthorn, VIC, Australia

 

How galaxies are distributed in space is determined by a combination of universal cosmological parameters, gravity, and the physics of galaxy formation. Quantifying galaxy clustering requires computing pair-wise separations — an inherently quadratic process. Consequently, comparing the observed clustering of galaxies to that theoretically predicted is both useful to advance our understanding of physics and technically challenging. Here I present Corrfunc — a suite of OpenMP-parallelized clustering codes that target current CPU micro-architecture with custom Advanced Vector Extensions AVX512F, AVX) and Streaming SIMD Extensions (SSE) intrinsics. By design, Corrfunc is highly optimized and is at least a factor of few faster than all existing public galaxy clustering correlation function routines. While Corrfunc was developed primarily with astrophysical applications in mind, the basic algorithm within Corrfunc can be easily extended for applications that require looping over neighbours up to a certain maximum spatial extent. For instance, molecular dynamics simulations, game development for modeling flocking behavior, any cross-matching between multiple datasets based on spatial separation, can potentially benefit from the Corrfunc algorithms. Corrfunc is covered by a suite of tests, extensive documentation and is publicly available at https://github.com/manodeep/Corrfunc.


Biography:

Dr. Manodeep Sinha is a computational astrophysicist based at the Centre for Astrophysics & Supercomputing at Swinburne University, Melbourne. Dr. Sinha completed his PhD in Astronomy from The Pennsylvania State University, and is currently a Senior Research Software Scientist, working with the ARC Centre of Excellence All-Sky Astrophysics in 3D (ASTRO 3D). Dr. Sinha works at the intersection of astrophysics, statistics, high-performance computing and software engineering.

Increasing scientific productivity through scalable computation and data

Dr Ben Evans1

1NCI Australia, Canberra, Australia

 

Since 2008, through a series of collaborative NCRIS programs, a number of significant scientific research activities at NCI that involve scalable computing, data analysis and a large collection of FAIR reference datasets. In particular, these have computing capability to enable significant improvements to the outcomes of climate, weather, geophysics and environmental science, and have become a central infrastructure that is used by the climate community, many NCRIS capabilities (e.g, IMOS, TERN, AuScope, AAL) and government programs (e.g., weather prediction, Digital Earth Australia). The success has meant increased productivity of scientific outcomes and impact for national priorities that have been enabled by these investments.

However, the high performance computing sector has been known for some time that performance is no longer advancing according to Moore’s law and the future of scalable computing will need to evolve in new ways. This required us to reexamine our software and algorithms, increase our effort for improvements and scalability and reconsider some old assumptions of data precision and reproducibility.

The current success of colocating of scientific datasets with HPC computational infrastructure did not happen overnight: they required a long and steady timeline requiring deepening alignment of data and compute to demonstrate the success of the approach.  At the same time, standards for interoperability and interconnectivity between scientific fields have been slowly maturing, and in many cases transdisciplinary science is now a reality.

In this talk I will discuss the journey so far and challenges ahead.


Biography:

Dr Ben Evans is NCI’s Associate Director for Research Engagements and Initiatives and is responsible for driving innovation and development of future capabilities for HPC and data-driven science with NCI major stakeholders, including NCI’s partners, research communities, and national and international collaborators. He leads a team of specialists in data science and data management, computational model and software development, and scientific visualisations, with the focus on harnessing the full power of NCI’s National Tier 1 high-performance capabilities, digital assets, and new innovations to address the ongoing challenges of high-performance computing and data-driven research.

Ben has developed NCI’s strategic programs in Climate, Weather, Environment, and Geoscience, which support research and national/international collaborations across the university and government scientific community. The goal has been to develop the computational and data-intensive science expertise to improve the performance of high-resolution models and provide sustainable research platforms which support both the novice user as well as more advanced science and innovative data analysis techniques.

GSKY – A scalable geospatial data service

Dr Ben Evans1, Dr Qurat Tariq1, Mr Edison Guo1, Dr  Sivaprasad1, Mr Chris Allen1, Dr Kelsey Druken1, Dr Nigel Rees1

1NCI Australia, Canberra, Australia

 

For researchers analysing, transforming, and integrating large geospatial datasets, the traditional approach has been to either download a relevant part of data and analyse these data subsets in an ad-hoc manner, or to requires skill in batch processing large numbers of files for further analysis. For many of our national datasets this has become infeasible due to the volume of storage space and the work needed to wrangle data at this scale. However, recent developments in significant data repositories with integrated data processing infrastructure opens the door for new ways of processing data on demand.

NCI has developed a scalable geospatial data server called GSKY, which provides a new capability for high performance data analysis. GSKY is currently being used in some national and international initiatives – providing fast access to programs and tools over the network, and allowing researchers to analyse NCI’s multi-petabyte nationally significant research data collections: from satellite data products, climate and weather simulations, and rich geophysics data.

GSKY supports on demand processing of data that provides interactive data exploration presented as an OGC standards-compliant services – Web Map Services (WMS), Web Processing Services (WPS) and Web Coverage Services (WCS). GSKY dynamically and efficiently distributes the requisite computations among computational nodes and thus provides a scalable analysis framework to provide high performance real-time access to these petabyte size datasets.

GSKY represents a new class of intelligent data services, that provides a new research capability by utilising the power real-time high performance services and big data.


Biography:

Dr Ben Evans is NCI’s Associate Director for Research Engagements and Initiatives. He is responsible for driving innovation in future HPC and data-driven science with NCI major stakeholders in prioritised strategic activities, including NCI’s partners, research communities, and national and international collaborators.

Ben has developed NCI’s strategic programs in Climate, Weather, Environment, and Geoscience, which support research and national/international collaborations across the university and government scientific community. The goal has been to develop the computational and data-intensive science expertise to improve the performance of high-resolution models and provide sustainable research platforms which support both the novice user as well as more advanced science and innovative data analysis techniques.

 

A visualization workflow for high dimensional spatio-temporal datasets

Dr Abeer Mazher1, Dr Luk Peeters2

1Csiro, Perth, Australia, 2Csiro, Adelaide, Australia

 

Numerical modelling increasingly generates massive, high dimensional spatio-temporal datasets. Exploring such datasets relies on effective visualization. This study presents a generic workflow to (i) project high dimensional spatio-temporal data onto a two-dimension (2D) plane in a computationally efficient manner, such that; distances between data points in high dimensional space are preserved accurately in 2D and (ii) represent 2D projection spatially using a two dimensional perceptually uniform background color map.

Machine Learning (ML) based Dimensionality Reduction Techniques (DRT) for data visualization i.e., t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Maniflold Approximation and Projection (UMAP) are compared with the traditional Principal Component Analysis (PCA) by incorporating perceptual uniform color scheme in terms of accuracy, resolution and computational efficiency. The accuracy is evaluated using DRT independent quality metric based on the co-ranking framework.

The workflow is applied to an output dataset of an Australian Water Resource Assessment (AWRA) Model for Tasmania, Australia. The dataset consists of daily time series of nine components of the water balance at a 5 km grid cell resolution for the year 2017. The case study shows that PCA provides rapid visualization of global data structure, while the more computationally demanding t-SNE provides more accurate representation of local trends and variations. However, UMAP preserve more global structure with superior run time performance compare to t-SNE. The spatial visualization workflow, coupling low dimensional projection with perceptually uniform color maps, allows a visual expert interpretation of the high dimensional datasets and expected to perform well for earth science applications.


Biography:

As an Applied Statistician, I have an extensive experience in developing statistical algorithms to implement in the fields of Earth Sciences, Econometrics and Remote Sensing in order to provide potential solutions to real world problems. I work with the diverse team of researchers to explore pattern recognition, machine/ deep learning and visualization techniques for Remote Sensing and Earth Science applications.

Two interpolation methods for vector fields that conserve fluxes and line integrals

Dr Alexander Pletzer1, Dr Wolfgang Hayek1, Dr Samantha  Adams2

1NeSI/NIWA, Hataitai, New Zealand, 2UK Met Office , Exeter, United Kingdom

 

Interpolation was invented by the old Babylonians in 2000-1700 BC but it is only since 1999 that conservative interpolation has been introduced and applied to earth sciences.  Due to the need to preserve the total amount of water, energy etc., conservative interpolation has become one of the most widely used regridding methods. We show that conservative interpolation is just one member of a larger family of so-called mimetic interpolation methods, which conserve volume, area and line integrals. Area/line conserving interpolation is applicable to vector fields with face/edge centred staggering, respectively. Using grids from the next generation weather and climate prediction code LFRic, we demonstrate the benefit of mimetic interpolation in the case of unstructured grids with arbitrary, quadrilateral cells. In addition to conserving vorticity and fluxes, mimetic methods are immune to pole-like singularities and can be extended to work with partially masked cells, which often arise in earth sciences.


Biography:

Alex Pletzer is a physicist who drifted towards computational science and is now a research software engineer for New Zealand eScience Infrastructure (NeSI) at the National Institute of Water and Atmospheric research (NIWA).

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd