Bridging science delivery from on premises to cloud

Benjamin Vanzino1

1Geoscience Australia

 

Maintaining relevance and improving time to market for science data products has been a long standing issue that requires a complete understanding of the science and Information Technology workflows involved. The goal is to standardize and automate delivery pipelines for these science data products.  We cover current working platforms and delve into the data management space with the use of conventions and defined schema through to how that translates to automated systems for creation and delivery for Geoscience Australia products. Data management is the key to standardisation and automation for the real time, near real time and scheduled update of products. Our move to cloud based systems has been dramatically impactful to reduce time to market, improve up time and resilience of systems, improve project agility to acquire resources on demand, and to increase value of existing systems. Taking an enterprise approach to ensure scalability of systems has helped manage the organic growth of systems in attempting to bridge the gap from traditional IT to modern cloud based delivery platforms.


Biography: To be confirmed

Visualisation at DST

Miss Kristina Johnson1

1Defence Science And Technology Group, Fishermans Bend, Australia

 

Visualisation provides us with the ability to combine human capabilities and perception together with computer algorithms, and it is an important tool for understanding and communicating data and analysing results.  Computational science is capable of creating very large datasets that can be difficult to interpret.  Therefore, the ability to see data is more important than ever.

This presentation will showcase data being visualised by researchers at DST.  It will also include a brief overview of new visualisation capabilities being developed within the organisation.


Biography:

HPC Data Visualisation Specialist at DST

Factors affecting ENSO predictability in an empirical model of tropical air-sea interactions

Harun Rashid1

1CSIRO, Melbourne, VIC

 

El Niño‒Southern Oscillation (ENSO) is the dominant mode of tropical interannual climate variability, with large influence on global weather and climate. Here we construct an empirical dynamical model of tropical Pacific air-sea interactions to investigate various factors affecting ENSO prediction skill. A hierarchy of models with increasing complexity have been constructed using data for 1958-1990 and retrospective forecasts are made for 1991- 2017 for each of the models. The model with the best ENSO prediction skill is then chosen as a reference model. The reference model’s predictability limit, defined here as the forecast lead month of 0.5 anomaly correlation (AC), is around 11 months. After establishing the suitability of this model by comparing its simulated ENSO properties with the observed, we use it to determine the relative importance of several factors affecting the model’s ENSO prediction skill. In particular, we examine the extent to which ENSO prediction skill is affected by the main atmosphere-ocean interaction processes―thermocline and zonal wind feedbacks and zonal wind forcing―on ENSO predictability. We find that all these processes significantly affect ENSO predictability and extend the predictability limit by up to five months, with the largest effect coming from the thermocline feedback. The other processes with progressively smaller effects are the total zonal wind forcing, zonal wind feedback and external zonal wind forcing. This result suggests that the dynamical seasonal prediction models must have a good representation of the major ENSO processes in order to have good ENSO prediction skills.


Biography:

Dr Harun Rashid is a senior research scientist in CSIRO Climate Science Centre of the Oceans and Atmosphere BU. His expertise is in understanding and predicting climate variability using dynamical and empirical models of the earth’s climate system. His current interest is modelling El Niño‒Southern Oscillation (ENSO) using Coupled Global Climate Model (CGCM) and Empirical Dynamical Models (EDMs).

Resolving the seismic velocity structure of the Australian lithosphere

Marcus Haynes1, Alexei Gorbatov1, Babak Hejrani1, Rakib Hassan1, Fei Zhang1

1Geoscience Australia, Canberra, ACT

 

Tomographic inversion of seismic data enable geophysical imaging of otherwise-inaccessible regions of Earths lithosphere and mantle. Such models can provide important constrains on structure and composition with depth. For instance, the relationship between regional seismic wave-speeds and tectonics has been known for a long time. However, more recently it has been recognised that lithospheric structure can also allow spatial inferences to be made about the systems responsible for generating economic mineral deposits. Given this, high-resolution 3D seismic imaging of the Australian lithosphere has been identified as a high-priority for improving mineral exploration.

Seismic velocities in the lithosphere can be inferred by the relative travel-time between earthquakes and seismic stations. Australia’s intra-continental setting records relatively few earthquakes in comparison to the subducting oceanic slabs to the north and east, and hence passive-seismic imaging of the lithosphere requires the deployment of dense arrays of seismometers. This leads to heterogeneous data coverage across the continent and, as such, the coarseness with which we can infer lithospheric seismic velocities varies spatially.

Model resolution analysis characterises the degree to which individual model parameters can be independently predicted. We use the results of resolution analysis to directly guide the construction of an irregular grid mesh across our model domain. This effectively alters the regularisation of our inversions and allows the 3D seismic velocity structure to be inferred across a range of spatial scales corresponding to the amount of information available.


Biography:

Marcus joined Geoscience Australia in 2007 as a cadet, and has worked across the agency in various roles. He currently works as a geophysicist in the Mineral Potential section. Marcus’ role involves the geophysical imaging of the lithosphere for mineral system assessments. Marcus is also concurrently completing a PhD at the Australian National University and is in the final stages of writing up his thesis which examines the conductive flow of heat through the Australian continental crust.

Fully Homomorphic Encryption and k-Nearest Neighbour Classification

Kiowa Scott-Hurley1, Chris Watkins1

1CSIRO, Clayton South, VIC

 

The ability to perform encrypted computation on encrypted data enables a range of cloud and edge based computing solutions to be applied to sensitive data, either at scale or closer to the data source. We implemented a k-nearest neighbours classifier in a ho-momorphically encrypted space using the Microsoft SEAL library. The scheme imagines a user and cloud scenario, in which multiple users cooperatively train a classifier on their combined encrypted data without sharing the data with one another, to motivate the use of encrypted computation. We demonstrate near linear performance results on large datasets (16,000 points) across a range of model parameters. This implementation illustrates that fully homomorphically encrypted machine learning is no longer prohibitively slow, and opens a pathway to encrypt other machine learning techniques in the future.


Biography:

Kiowa Scott-Hurley is a cadet with the Scientific Computing team at CSIRO. A student of philosophy and pure mathematics Kiowa has been applying her high level reasoning and complex problem solving skills to challenges in modern post quantum safe cryptography and machine learning

An semi-supervised framework for the classification and collation of complex radio galaxies using rotationally invariant self-organizing maps

Tim Galvin1

1CSIRO, Perth, WA, Australia

 

The Australian Square Kilometer Array Pathfinder (ASKAP) is a next generation radio telescope located in Western Australia. Once fully commissioned one of its key science projects, the Evolutionary Map of the Universe (EMU), is expect to detect upwards of 70 million radio objects over five years. This absurd data volume requires new approaches to classify complex objects to extract the maximum amount of scientific knowledge. We investigate how rotationally invariant self-organizing maps (SOM) can be used as a tool to identify the predominate morphological shapes of sources as well as a tool to transfer knowledge from labelled data-sets to unlabelled subjects. Further, by exploiting the transform information learnt by the SOM, we construct an approach that is able to identify the individual components of complex sources across multiple wavelengths without requiring large training sets with known labels. As this approach requires only source positions and no labels for training, it is ideal for EMU and similar deep, all sky radio surveys from the next generation of radio telescopes.


Biography:

PostDoc working at CSIRO Astronomy and Space Science researching applications of machine learning methods for the classification of radio objects.

Scaling Physical Sample Identifiers across all Research Domains within the Research Ecosystem

Jens Klump1, Kerstin Lehnert2, Sarah Ramdeen3, Lesley Wyborn4

1CSIRO, Kensington, WA, Australia

2Columbia University, Palisades, New York, USA

3Ronin Institute, Huntsville, Alabama, USA

4Australian National University, Canberra, ACT, Australia

 

Samples taken from nature or produced in laboratory experiments have always been at the heart of scientific research. Over the past two centuries, we have collected hundreds of millions of samples, and we are still collecting more. However, while infrastructures for scientific literature and data have evolved into a networked and searchable research information ecosystem, online access to sample information has lagged way behind and often we cannot even unambiguously identify which samples were the basis of which dataset and publication.

In the geosciences, the International Geo Sample Number Implementation Organization (IGSN e.V.) has built a persistent identifier and catalogue infrastructure that gives access to millions of sample records. The underlying infrastructure can be used in other science disciplines such as biology, archaeology, materials science, etc. To be able to scale this infrastructure to billions of samples, interconnected with a comparable number of datasets and their related publications, requires a redesign of both the organisational model and technical architecture of current persistent identifier infrastructures. Growing the scale of persistent identifier systems also needs coordination across the key identifier systems such as ORCID, DataCite, Crossref, etc. This contribution will present the present state of the discussion on how to link physical samples to the research ecosystem and the record of science, and to ensure attribution to those who originally collected the sample.


Biography:

Jens Klump is a geochemist by training and Geoscience Analytics Team Leader in the Mineral Resources unit of CSIRO. Jens holds a PhD in Marine Geology from the University of Bremen in Germany. His involvement in the development of publication and citation of research data through Digital Object Identifiers (DOI) sparked further work on research data infrastructures. Jens’ current work focuses on data in minerals exploration, both from a data analysis and from a data logistics perspective. Jens is the Vice-President of the IGSN e.V. and the Vice-President of the Earth and Space Sciences Division of the European Geosciences Union.

Corrfunc: Blazing Fast Correlation Functions on the CPU

Manodeep Sinha1

1Centre for Astrophysics & Supercomputing, Swinburne University Of Technology, Hawthorn, VIC, Australia

 

How galaxies are distributed in space is determined by a combination of universal cosmological parameters, gravity, and the physics of galaxy formation. Quantifying galaxy clustering requires computing pair-wise separations — an inherently quadratic process. Consequently, comparing the observed clustering of galaxies to that theoretically predicted is both useful to advance our understanding of physics and technically challenging. Here I present Corrfunc — a suite of OpenMP-parallelized clustering codes that target current CPU micro-architecture with custom Advanced Vector Extensions AVX512F, AVX) and Streaming SIMD Extensions (SSE) intrinsics. By design, Corrfunc is highly optimized and is at least a factor of few faster than all existing public galaxy clustering correlation function routines. While Corrfunc was developed primarily with astrophysical applications in mind, the basic algorithm within Corrfunc can be easily extended for applications that require looping over neighbours up to a certain maximum spatial extent. For instance, molecular dynamics simulations, game development for modeling flocking behavior, any cross-matching between multiple datasets based on spatial separation, can potentially benefit from the Corrfunc algorithms. Corrfunc is covered by a suite of tests, extensive documentation and is publicly available at https://github.com/manodeep/Corrfunc.


Biography:

Dr. Manodeep Sinha is a computational astrophysicist based at the Centre for Astrophysics & Supercomputing at Swinburne University, Melbourne. Dr. Sinha completed his PhD in Astronomy from The Pennsylvania State University, and is currently a Senior Research Software Scientist, working with the ARC Centre of Excellence All-Sky Astrophysics in 3D (ASTRO 3D). Dr. Sinha works at the intersection of astrophysics, statistics, high-performance computing and software engineering.

From a Data Rivulet to a River: Lessons learnt from upgrading the Deterministic Seven-Day Streamflow Forecast System to provide Probabilistic Flow Ensembles at the Bureau of Meteorology

Patrick Sunter1, Daehyok Shin1, Prasantha Hapuarachchi1, Maree Carroll1, Sophie Zhang1

1Australian Bureau Of Meteorology, Melbourne, VIC, Australia

 

This presentation will discuss the challenges faced, and how we addressed them, in a multi-year project to upgrade the Australian Bureau of Meteorology’s (BoM) Seven-Day Streamflow Forecasting service to provide ensemble probabilistic forecasts.

The project involved integrating many new statistical approaches, algorithms and data sources – several of which originated in collaborative research with the CSIRO and Australian universities – into a production-ready system able to publish results daily for several hundred locations on the Bureau’s website.

We will discuss the ways this work challenged our existing systems and the ways we addressed those challenges, including:
• data management and provenance: requiring new approaches to handle version control of much larger data artefacts and model representations, including moving to Git Large File Storage (LFS) for managing hydrological model configuration and verification data;
• performance and scalability: including updating our Python software that previously worked effectively on deterministic Numerical Weather Prediction (NWP) grids to deal with higher-resolution ensemble forecasts;
• system integration: the challenge of integrating new R&D software into production architectures, including dealing with legacy systems;
• Redesigning outputs for better scientific communication: Including updating graphical plots that balance communicating the extra information included in probabilistic forecasts, while not overwhelming generalists with too much information.
Finally, we will attempt to draw out the most relevant lessons learnt from this project for other eResearch practitioners and other scientific software engineers.


Biography:

Patrick Sunter has worked in the field of software engineering of scientific computing applications for more than a decade, participating in multiple collaborative projects in research and industry. Building on a base of software engineering post-graduate training, he has worked across the domains of geophysics, materials science, and spatial information to develop software to support modelling and analysis of complex problems.

Patrick joined the Australian Bureau of Meteorology’s Water Forecasting Services section in 2016, and since then has worked on upgrades to the software and information systems that underpin the Bureau’s seasonal and short-term streamflow forecasting services.

Scaling Agile for SKA: Adoption of SAFe as the large-scale agile methodology for the construction of the SKA software systems

Juan Carlos Guzman1

1CSIRO, Bentley, WA, Australia

 

The Square Kilometre Array (SKA) project has completed most of the design work and started to prepare for construction due to commence at the end of 2020. A large fraction of the construction effort will be dedicated to software development, estimated at around 600+ FTE over 6 years of construction and distributed in multiple teams across the globe. To tackle this large-scale distributed development effort, the SKA Office decided to adopt the Scaled Agile Framework (SAFe). SAFe is a proven, publicly-facing framework for applying
Lean and Agile practices at enterprise scale, and one of the most popular large-scale agile methodologies in the market.

To gain experience in this new methodology, the SKA has started to use it in the context of the “bridging” activities, that is the time between the end of design and start of construction. We are currently in the second increment with 12 active agile teams continuing prototyping work and addressing key areas for the upcoming System Critical Design Review (CDR) scheduled for the end of this year.

This talk will introduce SAFe, why it was chosen for the SKA and shared some early results on this new large-scale agile software methodology applied to a big research project.


Biography:

Juan Carlos (JC) Guzman is the Head of the Software and Computing Group at CSIRO Astronomy and Space Science. He joined CSIRO in 2007 and has been working mainly in the ASKAP project, fulfilling many roles including Developer, Architect, Team and Group leadership. He also has been contributing to the SKA project since 2012. Before joining CSIRO he worked at the European Souther Observatory (ESO) in Chile developing monitoring and control system software for several optical telescopes located in the Chilean desert.

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd