Challenges to research integrity from data-intensive research and how to mitigate them
Prof. Michael Barber1
1Australian Academy of Science, Canberra, Australia
Data-intensive research and the new data technologies supplied by data science are changing the nature of research. They are opening up new research opportunities that, in turn, create greater impact. It is doubtful that a Covid-19 vaccine could have been developed in record-breaking time without genomics and data technologies. Such exciting developments are spread right across the research spectrum. However, they are accompanied by serious challenges and risks.
Critical issues such as transparency, integrity of and representativeness in data sets, reproducibility and replicability, shoddy research practices, and, unfortunately, research fraud are not new nor restricted to data-intensive research. Data-intensive research and developments in data science are exacerbating concern with these issues. The risks also increase as tools, such as machine learning, become ‘commoditised’ and are used by scientists with limited awareness of their potential pitfalls. Many of these challenges go to the heart of the scientific method and, if not addressed, have the potential to not only discredit data-intensive research but erode trust in science.
In this talk, I argue that the data-intensive research community has a vital interest in addressing these challenges and mitigating the associated risks. I will propose some principles to guide this response.
Emeritus Professor Michael Barber AO, FAA, FTSE, is an internationally recognised applied mathematician and computational scientist, a former vice-chancellor of Flinders University and a former senior executive in CSIRO. From 2016 to 2020, he was Chair of the National Computational Infrastructure. He has a particular interest in improving research integrity.