Pangeo Performance: Benchmarking Interactive Big Data Climate Science

Dr. James Munroe


Climate simulations, with coupled atmospheric, ocean, land, and ice models, generate significantly sized datasets. Dozens of three-dimensional variables are generated with years of model data generated at daily time intervals in addition to multi-year forecasts starting from rolling monthly initial times, which effectively give two different time dimensions in the dataset. Each simulation is but one realization of an inherently stochastic system so we generate ensembles of many simulations to give meaningful statistics from a large six dimensional dataset.

While HPC is able to run these large models, we also need to be able to analyze and interrogate such datasets. Using the Pangeo platform, we have been investigating different storage platforms (e.g. BeeGFS, Amazon S3) that have the potential to scale up to operations that require aggregating significant subsets of model output.  The objective is to be able to run these analysis operations in an interactive environment allowing scientists to respond to the results in iteration cycles of second to minutes instead of hours to days.  Benchmark strategies and results based on common workflows such as calculating means and anomalies will be discussed.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2017 Conference Design Pty Ltd