Directly computing against public and research cloud object stores

Paul Branson

Coastal Research Scientist – CSIRO

 

Tired of mirroring data? The complete archive of the Australian Integrated Marine Observing System (IMOS) is now available on a publicly accessible Amazon S3 bucket. Also, recently AARNET has provided 1TB of storage to all users of the not-for-profit National Research and Education Network, which includes all the national universities and their undergraduate students. Oceanographic research often requires analysis or sub-setting of large earth observation or numerical model datasets where it may be impractical to mirror the complete archive.. This presentation evaluates the use of a curated software container from HPC and the research cloud (Pawsey Nimbus) to directly access NetCDF data from IMOS (Amazon S3) and AARNET (Minio S3). It makes use of the Pangeo framework to evaluate the IO scalability of direct access to research and public cloud object stores compared to access via the AODN THREDDS service. Finally, it builds on previous work to demonstrate the benefit of converting to cloud optimised storage formats (Zarr) when data is transferred to the cloud.

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd