HiperSeis: Supercharging Seismic Workflows on High Performance Computing Platforms

Dr Rakib Hassan1, Dr Babak Hejrani1, Dr Alexei Gorbatov1, Dr Fei Zhang1

1Geoscience Australia, Symonston, Australia

 

Geoscience Australia (GA) maintains a collection of permanent seismic stations scattered around continental Australia. GA also deploys temporal arrays of seismic stations, progressively spanning the entire continent, which acquire data at greater spatial resolutions, but over shorter time periods. In addition, GA has access to historical temporal deployments carried out by partner institutions.

Much of these data are stored on traditional file systems in legacy formats and are not amenable to data- and compute-intensive seismic workflows, e.g. detection of earthquake phase arrivals for generating body wave tomography and computation of cross-correlations for generating ambient noise tomography.

HiperSeis is a collection of software programs, developed to convert and process seismic waveform data. It comprises scripts for converting seismic waveform data into the Adaptable Seismic Data Format (ASDF), amenable to highly scalable parallel file-systems, e.g. the Lustre file-system available at the National Computational Infrastructure (NCI). It also contains parallelized modules for detecting earthquake phase arrivals and computing cross-correlations between waveform data from a pair of seismic stations.

Current results from parallel earthquake phase arrival detection, run on 336 cores, over more than 20 TB of combined waveform data, suggest a speed up by a factor of ~100. An exercise that would have otherwise taken in the order of three months can now be completed overnight. We expect similar, potentially improved, speedups for the more computationally intensive cross-correlation workflow. The short turnaround times of these workflows facilitate experimentation with enhanced algorithms for seismic data analysis.


Biography:

to be advised

Quantum Computing. Available Platforms and Software for Investigating Quantum Algorithms.

Dr Fanel Donea1, William Mead1, Thomas Leatham1

1Csiro Advanced Scientific Computing, Clayton, Australia

 

The poster presents results from investigating a selection of currently available platforms for running quantum computing algorithms, including direct access to publicly available quantum computers, third party quantum emulators running via the web, and quantum emulators installed on local computers. It also presents work done on building software for a quantum computer emulator. The poster includes work done within the framework of two vacation scholarships offered by CSIRO in 2018-2019.


Biography:

To be confirmed

Classifying and predicting the electron affinity of diamond nanoparticles using machine learning

Dr Chris Feigl1, Dr Benjamin Motevalli1, Dr Baichun Sun1, Dr Amanda Parker1, Dr Amanda Barnard1

1CSIRO, Docklands, Australia

 

Using a combination of electronic structure simulations and machine learning we have shown that the characteristic negative electron affinity (NEA) of hydrogenated diamond nanoparticles exhibits a class-dependent structure/property relationship. Using a random forest classifier we find that the NEA will either be consistent with bulk diamond surfaces, or much higher than the bulk diamond value; and using class-specific random forrest regressors with extra trees we find that these classes are and either size-dependent or anisotropy-dependent, respectively. This suggests that the purification or screening of nanodiamond samples to remove strained, heterogeneous or anisotropic particles may be undertaken based on the negative electron affinity.


Biography:

Dr Chris Feigl is a Research Scientist working within the Materials and Molecular Modelling team of Data61, CSIRO.  He completed his PhD in Theoretical Condensed Matter Physics from RMIT University in 2012, after which he went into executive management for education and training and humanitarian aid organisations in the middle-east region.  Since returning to Australia, Chris’s research has re-focused on applying machine learning methods to the prediction and characterisation of nanomaterial properties.

Australia’s Marine National Facility: A floating sensor platform for big data

Ms Katherine Tattersall1, Dr Chris Jackett1, Mr Ian  Hawkes1

1CSIRO Oceans & Atmosphere, Battery Point, Australia

 

Australia’s Marine National Facility (MNF) is hosted by CSIRO Oceans & Atmosphere and manages the operation of the state-of-the-art RV Investigator, an Australian government blue-water research vessel dedicated to supporting Australia’s atmospheric, oceanographic, biological and geosciences research. The RV Investigator is equipped with a multitude of sensors that map the ocean floor, measure and sample the water column and collect atmospheric measurements. The vessel is also a platform for a vast array of other marine research instruments and equipment. Ship time aboard the MNF is available to researchers Australia-wide through an annual application process.

Managing this large working platform is a complex task. One major challenge is to efficiently, elegantly and robustly handle the many data streams captured by sensors and equipment and to quickly make high quality data available to researchers. We follow streamlined data management processes including on-board data aggregation and metadata capture, secure end-of-voyage data storage and archiving, publicly available data catalogues and portals and a team dedicated to data acquisition and processing. Data management and distribution is the responsibility of the O&A Information and Data Centre (IDC). The MNF follows open data objectives as outlined in the FAIR principles (Findable, Accessible, Interoperable and Reusable) and is an advocate of data and metadata standards and associated software tools and processes.

This poster illustrates important elements of the current MNF/CSIRO data workflow from acquisition to publication. Key processes and data access points are highlighted to convey the broad capabilities of the MNF. More information is online at https://research.csiro.au/oa-idc/marine-national-facility-datasets/.


Biography:

Katherine Tattersall is a research data specialist and data architect with over a decade of experience in the marine data domain and a firm grounding in the elements of meticulous and innovative research data management more broadly. Her background in marine physical, ecosystem, fisheries and geospatial research equipped her with an understanding of what researchers need from data infrastructure and tools.

Chris Jackett is a software engineer with experience developing scientific data processing systems. He has a background in marine science and remote sensing, and has worked on data storage solutions for drone-based multispectral imagery, computational systems for the optimisation of aquacultural planning, and web application development using modern JavaScript frameworks, tools and techniques.

Ian Hawkes manages the Marine National Facility research vessel RV Investigator’s Information and Communications Technology (ICT) systems; providing seagoing computing support; operation of various vessel scientific data acquisition systems; quality control and processing of key datasets and the delivery of associated data products. Ian has an Honours degree in Physics and has three decades of experience as a Systems and Software Engineer. He has worked on all phases of projects including proposals, identifying and analysing requirements, object oriented design, coding in C++, testing and documentation.

 

QuickThermo: A software to perform ab initio thermodynamic calculations

Dr Benyamin Motevalli1, Dr Amanda Barnard1

1CSIRO Data61, Docklands, Australia

 

The energies obtained using first principles methods can be used to study thermodynamic stabilities of complex systems, particularly where experimental measurements are limited due to difficulties. However, the calculated energies only account for the ground state (temperature T ≈ 0K, pressure P=0 Pa) electronic energies, E. One practical solution to extend the ground state energies to finite temperatures and pressures is the first principles (ab initio) thermodynamics method, which combines the results calculated from first principles at the ground state, and the extensive thermochemical data measured at the standard state. This method can also serve as an effective technique to expand datasets in a more sensible way by calculating the probabilities as a function of temperature, pressure, and other environmental conditions such as humidity.

QuickThermo is a software package that enables such calculations. It has a user-friendly interface which is developed in C#, using WPF technology and has a proper database developed in SQLite. The database also includes a number of predefined elements with corresponding measured thermochemical data. This database is flexible and can be grown by users. The interface provides convenient tools to define elements and structures and calculate thermodynamic probability for various environmental conditions such as temperature, pressure, and humidity. Further, the software provides batch run capabilities, where users can load any number of structures and perform the calculations for a range of environmental conditions. Also, a range of interactive plots are embedded in the software to display some results.


Biography:

Dr. Benyamin Motevalli is a Postdoctoral Fellow in Data61 at CSIRO. He has years of experience in developing/employing computational and numerical analysis techniques to establish fundamental understanding of novel intelligent nanomaterials.  His current research focuses on rational design of materials through innovative data-driven models that offer the advantage of fusing complex experimental and computational data for a higher-level understanding of structure-processing-property relationships

Online time series sensor data cleaning system: A case study in water quality

Dr Yifan Zhang1, Dr Peter Thorburn1, Mr Peter Fitch2

1CSIRO, Brisbane, Australia
2CSIRO, Canberra, Australia

 

Water quality high-frequency monitoring offers a comprehensive and improved insight into the temporal and spatial variability of the target ecosystem. However, most monitoring system lacks the consideration of sensor data quality control. The sensor data missing, background noises and signal interference have long been a huge obstacle for the users in understanding and analysing the sensor data, therefore makes the utilisation of sensor data much inefficient.

Therefore, we present an online data cleaning system for water quality sensor data. After collecting the raw sensor data, the data cleaning system applied different data filters to corresponding water quality sensor streams. In this approach, the specific environmental effects and can be considered separately. Cleaned data streams are then sent to the web-based frontend interfaces for end users.

There are two main tasks in this system:  detect and remove water quality outliers, and recover the missing sensor data. For the first task, the water quality filters are built based on the variable-specific threshold, changing rate and statistical distributions. The machine learning-based algorithms such as KNN are applied in filling the sensor data gaps in the monitoring streams.

The prototype system releases the end users from the trivial data cleaning work and shows a significant improvement in the readability of the water quality sensor data. In the next stage, more neural network based algorithms would be tested and integrated to provide more reliable and accurate data cleaning results.


Biography:

Yi-Fan Zhang is a Postdoctoral fellow in Agriculture & Food, CSIRO. He received a PhD in data science from Queensland University of Technology in 2016. His work focuses on deep learning for agriculture decision making and management, with an emphasis on time series modelling and forecasting.

Machine Learning for Rapid Material Characterisation

Mr Alex Pitt1, Mr Paul McPhee1, Dr Chad Hargrave1

1CSIRO, Pullenvale, Australia

 

Accurate characterisation of the microscopic structure of mined resources such as coal is fundamental to understanding its utility and environmental impact. Such information is also critical to understanding the provenance and potentially hazardous nature of dust, sediments and other environmental samples. Such microscopic analysis typically requires an expert petrographer or environmental scientist to characterise the samples manually, a time consuming and expensive process.

CSIRO has developed an automated Component Grain Analysis (CGA) system to reduce the time required to segment and characterise coal and dust samples. The system manipulates 300 GB sample images with sub-micron pixel resolution, using automated processing to resolve this data intensive task. CGA provides reliable statistics on the distribution of maceral types and impurities in coal samples, and component materials in dust samples.

A recent breakthrough development is the incorporation of machine learning (ML) algorithms to the complex task of particle segmentation. Specifically, convolutional neural networks (CNNs) have been employed due to their demonstrated successes in the semantic segmentation of natural images, their capacity for learning texture, their computational efficiency, and their suitability for distributed computation. State-of-the-art CNN models were trained on microscopy images and ground-truth labels (provided by CSIRO petrographers) and consistently converged to segmentation accuracies on validation data of more than 95% – exceeding the estimated noise in the expert labelling.

Further steps for the development of the ML system include hyper-parameter refinement, label noise reduction, and model augmentation for the automatic characterisation of component materials and intra-particle segmentation for complex particles.


Biography:

Alex Pitt is a software engineer in the Energy division at CSIRO. He graduated from the University of Queensland in 2011 with a B.E. (Software), and worked on data-engineering products for Microsoft in Redmond, Washington. After returning to Australia, he joined the CSIRO where he has worked for the last 6 years on sensor integration, signal processing and computer vision.

 

End User HPC Systems Overview

Dr Ahmed Shamsul Arefin1

1Csiro, Canberra, Australia

 

The CSIRO’s HPC cluster systems are composed of 500+ compute nodes with various strengths, values and features. Some of the nodes are made up with the world’s fastest GPUs, while some with TB+ of DRAM and so on. However,  the whole family of Linux computing facility is managed by a single SLES software image, rolled out and setup according to its target node profiles. A commercial HPC management tool called Bright Cluster Manager (BCM) is effectively utilized to tackle the HPC administration and monitoring workloads.  In this work, we briefly report the basic framework of management of the CSIRO’s HPC systems and introduce with an upcoming end user monitoring feature called User Portal.


Biography:

Dr Ahmed Arefin is a Computation Scientist working within the HPC Systems Team, Scientific Computing Platforms, CSIRO. He completed his PhD in Computer Science (Data-Parallel Computing & GPUs) from the University of Newcastle, Australia and worked as a Postdoctoral Researcher (Parallel Data Mining) at the Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine (CIBM), The University of Newcastle, Australia. His research interest focuses on the application of HPC in data mining, graphs and visualization.

Speak of the (Geosciences) DeVL – she’s extending!

Dr Lesley Wyborn1, Dr Carsten Friedrich2, Dr  Ben Evans1, Dr Nigel Rees1, Professor Graham  Heinson3, Dennis Conway3, Dr Michelle Salmon4, Dr Meghan Miller4, Julia Martin5, Dr Jens Klump6, Dr Mingfang Wu7, Mr Ryan Fraser6, Dr Tim Rawling8

1NCI, ANU, Canberra, Australia
2Data 61, Canberra, Australia
3The University of Adelaide, Adelaide, Australia
4Research School of Earth Sciences, ANU, Canberra, Australia
5Australian Research Data Commons, Canberra, Australia
6Mineral Resources, CSIRO, Perth, Australia
7ARDC, Melbourne, Australia
8AuScope, Melbourne, Australia

 

Since 2017 the Australian Research Data Commons (ARDC) has co-funded the Geosciences Data-enhanced Virtual Laboratory (DeVL) project in collaboration with AuScope, National Computational Infrastructure (NCI), CSIRO, the Research School of Earth Sciences (RSES) of the ANU, The University of Adelaide, and Curtin University. The project is a first step in realising the AuScope Virtual Research Environment (AVRE) as part of a strategic goal to develop a data assimilation and geoscientific discovery and analytics platform for the Australian continent. The Geosciences DeVL has four work packages: Magnetotellurics (MT), Passive Seismic (PS), International Geo Sample Number (IGSN), and the AVRE platform and portals.

The University of Adelaide MT collection is being curated through collaboration with NCI, the Geological Survey of South Australia, and The University of Adelaide. Datasets, including time-series and processed data, are now discoverable and accessible through NCI’s catalogue and data services, and available for download and further HPC processing and data analysis through AVRE.

The PS work package is progressively releasing the RSES PS collection through the AusPass portal (supported by funding from other sources).

A collaboration between ARDC, CSIRO, and Curtin University has developed an IGSN minting service to allocate globally unique identifiers for academic geochemistry samples: this service is now being extended to researchers from other disciplines.

The AVRE component, led by CSIRO, is focused on improving access to geophysics data through a common AuScope portal infrastructure, and with ARDC, on improving the description, discovery, and execution of software, workflows and scientific solutions relevant to Australian geoscience.


Biography:

Lesley Wyborn is an Adjunct Fellow at the National Computational Infrastructure and RSES at ANU and works part time for the Australian Research Data Commons. She previously had 42 years’ experience in scientific research (geochemistry and mineral systems research)  and in geoscientific data management in Geoscience Australia from 1972 to 2014. In geoinformatics her main interests are developing international standards that support the integration of Earth science datasets into transdisciplinary research projects and in developing seamless high performance data sets that can be used in high performance computing environments. She is currently Chair of the Australian Academy of Science ‘National Data in Science Committee’ and is on the American Geophysical Union Data Management Board. She was awarded the Australian Government Public Service Medal in 2014, the 2015 Geological Society of America Career Achievement Award in Geoinformatics and the 2019 US Earth Science Information Partners Martha Maiden Award.

The AuScope Virtual Research Environment (AVRE): A Platform for Open, Service-Oriented Science to ‘Understand the Earth’

Ryan  Fraser2, Dr Tim Rawling3, Dr Lesley Wyborn1, Dr Carsten Friedrich4, Dr Ben Evans1, Associate Professor Meghan Miller5, Professor  Brent McInnes6, Professor Louis  Moresi5, Dr Carsten Laukamp2, Nicholas Brown7

1NCI, ANU, Canberra, Australia
2Mineral Resources, Perth, Australia
3AuScope Limited, Melbourne, Australia
4Data 61, Canberra, Australia
5Research School of Earth Sciences, ANU, Canberra, Australia
6Curtin University, Bentley, Australia
7Geoscience Australia, Canberra, Australia

 

Since 2006, NCRIS projects (AuScope, NCI, ANDS, Nectar, RDS), and Government Agencies (GA, State/Territory Geological Surveys) have collaborated on building a suite of data portals, tools, software and virtual laboratories to support a diverse community of Earth scientists operating on a range of computational facilities including HPC, cloud, on-premise servers and desktops.

The 2016 National Research Infrastructure Roadmap noted that, to secure global leadership for the Earth Sciences over the next decade, Australia must now “enhance integration of existing data and mathematical modelling across large geographical areas to establish the next generation of ‘inward looking telescopes’ to better understand the evolution Earth’s crust and the resources contained within it”.

in 2017 the AuScope Virtual Research Environment (AVRE) was launched to support this new ambition, through enabling FAIR access to valuable academic research data collections and software, and improving coordination of existing national infrastructures. The aspiration was to realise a service-oriented science platform that will empower data assimilation and modelling across three networks: geophysics, geochemistry and geology.

AVRE cannot operate in isolation within the Australian Earth science community and will seek to ensure it can link to, and interoperate with data and services from other national NCRIS  research infrastructure initiatives (ARDC, NCI, TERN, IMOS, etc.). AVRE will also undertake a coordinated approach to optimising international partnerships and strategic collaborations in equivalent eResearch infrastructures globally. The end goal is to ensure Australian Earth science data and analytics can play a leadership role in next-generation transdisciplinary research to more comprehensively ‘Understand the Earth’.


Biography:

Ryan Fraser is now Program Leader for the AuScope Virtual Research Environments and has a long history with the former AuScope Grid program. He also lead the development of many collaborative eResearch projects with NeCTAR, ANDS and RDS, including the Virtual Geophysics Laboratory and the AuScope Scientific Software Solutions Centre.

Within CSIRO he is a skilled Portfolio Manager, with over 15 years of experience working in R&D, commercialisation of products and delivery to both government and industry using agile methodologies. Ryan has lead large, complex technology projects, managing sizeable software and interdisciplinary teams. His key focus is developing and fostering highly collaborative teams to deliver programs of work and continually grow capability to embark on future work

Ryan possesses specialist knowledge in decision science, data analytics, spatial information infrastructures, Disaster Management and Emergency Response, Cloud Computing, Data Management, and Interoperability and have extensive experience in managing and successfully delivering programs.

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd