Dr Simon Cox1, Dr Jonathan Yu1
1CSIRO, Clayton , Australia
Finding, using and trusting high quality datasets in any discipline is a “grand-challenge”. Often datasets are not curated in a way that allows for users or machines to decide they are fit-for-purpose. Conversely, data providers lack tooling and guidance about the quality or lack thereof of datasets they are publishing.
To address this gap, we have developed a 5-star data rating system that considers data quality criteria based on the FAIR data principles. Our rating system includes specific metrics (or examples) of how the FAIR principles could be met, to serve as concrete goals for data providers to aim for. This is particularly useful for FAIR’s Interoperable and Reusable principles (broken down into loadable, useable, and comprehensible). We make suggestions around formats and technologies, drawn from our experience with geospatial data. The FAIR principles are covered in CSIRO 5-star Data Ratings criteria, and we add published, updated/maintained, and trusted which are not covered by the FAIR principles.
We have also developed a companion 5-star Data Rating tool to allow self-assessment of a dataset using the above qualities, and ratings for each quality. The tool allows users to rate their data according to its current state. Questions presented to users also serve as tangible targets showing how one can to improve their data publication. See http://oznome.csiro.au/5star/
We present examples in the context of Australian government and research data showing how we can use this tool to assess data quality and suggest improvements.
Simon has been researching standards for publication and transfer of earth and environmental science data since the emergence of the world wide web. Starting in geophysics and mineral exploration, he has engaged with most areas of environmental science, including water resources, marine data, meteorology, soil, ecology and biodiversity. He is principal- or co-author of a number of international standards, including Geography Markup Language, and Observations & Measurements, that have been broadly adopted in Australia and Internationally. The value of these is in enabling data from multiple origins and disciplines to be combined more effectively, which is essential in tackling most contemporary problems in science and society. His current work focuses on aligning science information with the semantic web technologies and linked open data principles, and the formalization, publication and maintenance of controlled vocabularies and similar reference data.