Big data curation to enhance health and biomedical research

Ms Priyanka Pillai1

1The University Of Melbourne, Parkville, Australia


Background: The health data ecosystem is comprised of information from biosciences and medical research, health services, surveillance, behavioral and social determinants, population health and environment. This data ecosystem has the potential to build an enhanced evidence base for better interventions and also improve healthcare service delivery through data-empowered decision making. It is essential to improve the management of health and biomedical research data through its lifecycle to extract maximum value through high-throughput analysis, machine learning and visualisation.

Method: A literature review was undertaken to summarise challenges in managing health and biomedical research data. The scoping work also includes expert contributions from biomedical sciences, bioinformatics, health research and informatics.

Results: There are numerous challenges in the collaboratively using data from the health and biomedical research settings. National and international strategic plans on health and biomedical research emphasise enhanced data collection, efficient reporting systems and advanced infrastructure. There is global support for making health and biomedical research data available under F.A.I.R. (Findable, Accessible, Interoperable and Reusable) Principles to support knowledge discovery, retrieval and integration. The Five Safes Framework (Projects, People, Data, Outputs and Settings) is an accountability framework to inform decisions about data-related operations.

Conclusion: The challenges in using, sharing and aggregating data from health and biomedical research can be addressed by building trust among data custodians, promoting collaboration and implementing biocuration practices. The solutions to leverage the big data ecosystem should be agile, comply with ethical and legal requirements, facilitate equitable data access and discovery, expedite innovation and promote collaboration.


Priyanka Pillai is a bioinformatician and a software programmer by training and works as a Research Data Steward in the Melbourne Data Analytics Platform (MDAP). Priyanka also works as a Health Informatics Specialist for the Australian Partnership for Preparedness Research on Infectious Disease Emergencies (APPRISE) Centre of Excellence based at the Doherty Institute. Priyanka’s role with MDAP supports the uplift of data management capabilities at the University and also involves collaborating with academics on data-intensive research like bioinformatics and machine learning. Her role as a health informatician for APPRISE CRE supports a geographically distributed network of data holders and researchers and provides strategic advice to facilitate national and international information sharing. Priyanka has also been involved with science mentoring programs at the University and is an advocate of inclusivity of women in science.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.