Dr. James Munroe
Climate simulations, with coupled atmospheric, ocean, land, and ice models, generate significantly sized datasets. Dozens of three-dimensional variables are generated with years of model data generated at daily time intervals in addition to multi-year forecasts starting from rolling monthly initial times, which effectively give two different time dimensions in the dataset. Each simulation is but one realization of an inherently stochastic system so we generate ensembles of many simulations to give meaningful statistics from a large six dimensional dataset.
While HPC is able to run these large models, we also need to be able to analyze and interrogate such datasets. Using the Pangeo platform, we have been investigating different storage platforms (e.g. BeeGFS, Amazon S3) that have the potential to scale up to operations that require aggregating significant subsets of model output. The objective is to be able to run these analysis operations in an interactive environment allowing scientists to respond to the results in iteration cycles of second to minutes instead of hours to days. Benchmark strategies and results based on common workflows such as calculating means and anomalies will be discussed.