A data quality framework for high performance datasets

Dr Kelsey Druken1, Dr Ben Evans1, Sean  Pringle1, Kashif Gohar1, Dr Nigel Rees1, Clare Richards1, Dr Jingbo Wang1

1NCI Australia, Canberra, Australia

 

In the next few years, there will be a major increase in computational power- through upgraded HPC systems and further uptake of cloud-based platforms. At the same time, an enormous amount of new digital data will come on-line from many science domains. However the two do not simply come together, and in many cases, the data needs to be better organised to make it more tractable to process at-scale, and to make it programmatically accessible for a broader range of use-cases.

Over the last several years, NCI has been focused on improving computational access to some major national reference geospatial datasets. The data at NCI has also been significantly used by the wider community via remote access data services, including server-side data processing- utilising the colocation of data and computational processing power. With the data too big or too complex to move, we are now in the post-download era.

The challenge is to enable the quality of this data for a range of techniques to be usable and interoperable across multiple domains: this necessitates an increased focus on “FAIR data” principles- Findable, Accessible, Interoperable and Reusable. FAIR is underpinned by the concerted efforts that are happening internationally to develop community agreed standards that enable seamless programmatic access to data in high performance environments across multiple domains.

While this places additional requirements on the suppliers of both the data and metadata, the result is that data can be even more accessible- for primary use, secondary use, and citability through publication processes.


Biography:

Kelsey Druken manages the petascale data repository at NCI and her interests lie in data management, services and informatics. Prior to joining NCI in 2015, Kelsey was a researcher at the Research School of Earth Sciences at the Australian National University in Canberra and a postdoctoral fellow at the Carnegie Institution for Science in Washington, DC. She holds a PhD in Oceanography from the University of Rhode Island.

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2017 Conference Design Pty Ltd