Preparation for CMIP6: how to deal with a multi-petabyte climate data collection

Ms Claire Trenham1, Mr Tim Erwin2, Dr  Aurel Moise3, Dr Paola Petrelli4, Dr Kate Snow5, Dr  Louise Wilson3, Dr Vanessa Hernaman2, Ms Clare Richards5

1CSIRO, Black Mountain, Australia, 2CSIRO, Aspendale, Australia, 3Bureau of Meteorology, Melbourne, Australia, 4Centre of Excellence for Climate Extremes (UTas), Hobart, Australia, 5National Computational Infrastructure, Acton, Australia


The Coupled Model Intercomparison Project phase 6 (CMIP6) represents the largest collection of climate & weather data to date, with an expected total volume around 30PB. To work effectively with this data in Australia, the community needs a local replica of commonly used datasets, as well as means to find the data of interest for each researcher’s needs, and tools to effectively work with very large spatiotemporal datasets.

The National Computational Infrastructure (NCI) has established a mechanism to automatically download data from the Earth Systems Grid Federation (ESGF) for requested variables, and a database indexing this data. NCI supports collaborators from the Centre of Excellence for Climate Extremes (CleX), CSIRO and the Bureau of Meteorology to build and assess tools to enable effective community use of this data as it becomes available over the coming months.

The “CleF” tool has been developed by CleX to search for data stored locally at NCI as well as checking for additional data available on the ESGF. CSIRO and the BoM are working together to review the processing pipeline tool that was developed using CMIP5, observational and reanalysis data. We will identify what is needed to update “the pipeline” for python3, and make it compatible with the CleF search tool.

We report on the collaboration between key organisations to prepare for the deluge of CMIP6 data. We believe we are much more ready for this dataset than we were for the ~1PB CMIP5 dataset in 2012.


Claire works in the sea level, waves and coastal extremes team within CSIRO’s Climate Science Centre. Claire’s background spans mathematics, astrophysics, ocean wave modelling, high performance data services, data management and climate modelling. Claire is heavily involved in climate data and preparation for CMIP6, as well as regional and coastal climate modelling, data processing, making improvements to data and software to enhance science capabilities, and participating in STEM engagement activities with school students.



AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd