Mr Robert Davy1, Dr Ron Hoeke2, Ms Claire Trenham2, Dr Julian O’Grady2, Dr Mark Hemer2, Ms Rebecca Gregory2
1CSIRO Information Management & Technology, Canberra, Australia,
2CSIRO Oceans and Atmosphere, Aspendale, Australia
A CSIRO – Bureau of Meteorology partnership has been running gridded wave hindcast models at hourly time steps to produce estimates for historical ocean wave heights, fluxes and energy. Like many other gridded models, this output is optimised for spatial extracts at a given time step. Using data in this native form, constructing a 30+ year hourly time series at a grid point can take around 90 minutes, and large scale spatial analysis of time series extreme values is not practical.
An eResearch Collaboration Project was initiated with the aim of streamlining access to this data for time series analysis. Large speedups were achieved through reorganisation of the data into spatial tiles, concatenating in time, then performing NetCDF chunking in the time dimension. Due to the large memory requirements, processing is performed using a number of bash/python scripts on CSIRO’s large memory multiprocessor known as Ruby, with capability to update as new data comes in. As a result, retrieval of the time series at a random grid point now takes around 0.1 second. Extreme value analysis of the entire Australian coastal domain can be done on the Pearcey cluster (using job parallelism) in around 10 minutes.
An example is presented showing the science that this has enabled. We examine two historical storm-wave events, one which occurred at Sydney’s northern beaches (Collaroy-Narrabeen), Australia, and the other along Viti Levu, Fiji’s southern coastline (Coral Coast). Both events resulted in significant damage to coastal structures.
Robert Davy is a scientific software engineer at CSIRO Information Management & Technology. He is a member of the Scientific Computing Data Processing Services team. His focus is on use of data processing pipelines, DevOps tools and statistical analysis to unlock the latent value contained in large datasets. He has provided short and medium term support to a number of science teams via the IMT eResearch Collaboration Projects. He also has a background in quantitative analysis for renewable energy applications, and has co-authored a number of journal publications in this area.