Met Office Pangeo: Giving Scientists Back Their Flow

Kevin Donkers

Analysis, Visualisation & Data Team – UK MET Office

 

The UK Met Office is both the UK’s weather forecaster service, and its research institute for weather and climate science. With weather models ever increasing in volume and velocity of output, the ability for scientists to analyse so much data with the tools available becomes increasingly difficult. This can lead to compromised investigative analysis due to constraints on time and tools. The Met Office Informatics Lab sought out to address this with an elastic, cloud based platform on which analysts could build tools to work with such high momentum data. This started off with Jade and became Pangeo. This talk is a brief history of the Lab’s work with Pangeo, how it has been used and where it is going next.

Pangeo at the National Center for Atmospheric Research

Julia Kent

Computational and Information Systems Laboratory – NCAR

 

With the rise of Big Earth Data, geoscientists (and scientists in general) are facing the daunting challenge of performing analysis on datasets that are increasingly large and unwieldy.  The Pangeo platform, however, shows tremendous promise for scalable geoscientific data analysis.  Built on core technologies such as Dask and Xarray, the Pangeo framework provides the potential for interactive scientific analysis in a way that has never existed before.  Recognizing this promise, the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, USA, has invested in the development of the Pangeo platform for analysis use on NCAR’s Wyoming Supercomputing Center (NWSC).  In this presentation, we provide an overview of the activities surrounding NCAR’s efforts with Pangeo, including benchmarking and scaling studies of the Pangeo platform at NWSC, considerations on future computing infrastructure at the NWSC in support of Pangeo, software contributions to the Pangeo ecosystem, and educational efforts to entrain scientists in Python, Pangeo, and similar approaches.

Directly computing against public and research cloud object stores

Paul Branson

Coastal Research Scientist – CSIRO

 

Tired of mirroring data? The complete archive of the Australian Integrated Marine Observing System (IMOS) is now available on a publicly accessible Amazon S3 bucket. Also, recently AARNET has provided 1TB of storage to all users of the not-for-profit National Research and Education Network, which includes all the national universities and their undergraduate students. Oceanographic research often requires analysis or sub-setting of large earth observation or numerical model datasets where it may be impractical to mirror the complete archive.. This presentation evaluates the use of a curated software container from HPC and the research cloud (Pawsey Nimbus) to directly access NetCDF data from IMOS (Amazon S3) and AARNET (Minio S3). It makes use of the Pangeo framework to evaluate the IO scalability of direct access to research and public cloud object stores compared to access via the AODN THREDDS service. Finally, it builds on previous work to demonstrate the benefit of converting to cloud optimised storage formats (Zarr) when data is transferred to the cloud.

Pangeo Performance: Benchmarking Interactive Big Data Climate Science

Dr. James Munroe

 

Climate simulations, with coupled atmospheric, ocean, land, and ice models, generate significantly sized datasets. Dozens of three-dimensional variables are generated with years of model data generated at daily time intervals in addition to multi-year forecasts starting from rolling monthly initial times, which effectively give two different time dimensions in the dataset. Each simulation is but one realization of an inherently stochastic system so we generate ensembles of many simulations to give meaningful statistics from a large six dimensional dataset.

While HPC is able to run these large models, we also need to be able to analyze and interrogate such datasets. Using the Pangeo platform, we have been investigating different storage platforms (e.g. BeeGFS, Amazon S3) that have the potential to scale up to operations that require aggregating significant subsets of model output.  The objective is to be able to run these analysis operations in an interactive environment allowing scientists to respond to the results in iteration cycles of second to minutes instead of hours to days.  Benchmark strategies and results based on common workflows such as calculating means and anomalies will be discussed.

Facing up to the climate (data) crisis

Dr. Thomas Moore

Decadal Climate Forecasting Project – CSIRO

Scientists in climate research are struggling with both model output and observed data that are exploding in size.  Our nation’s supercomputers assimilate geoscience observations into coupled simulations of the oceans, atmosphere, and cryosphere that can generate ensembles upwards of 50TB for just one single experiment or a 30 year reanalysis on the order of half a PB.  So, how does the research scientist interactively explore these results?

Presented are the challenges of a small 20 member research team in the CSIRO Climate Science Centre and how we are leveraging the Pangeo stack towards our goals of unchaining scientific discovery with an interactive data exploration platform that can scale to meet the demands of our datasets.

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd