Building AgReFed: A reusable socio-technical design for community governance and data stewardship

Mr Paul Box2, Dr Megan Wong1, Mr Bruce Simons1, Ms Kerry Levett3, Dr Ashlin Lee2, Assoc Prof Helen Thompson1, Mr Joel Epstein1, Mr Andrew MacLeod1

1Centre For eResearch And Digital Innovation – CeRDI, Ballarat, Aust, 2CSIRO Environmental Informatics, Alexandria, Aust, 3Australian Research Data Commons (ARDC), Adelaide, Aust



We present the reusable socio-technical architecture of the Agricultural Research Federation (AgReFed) including the roles, policies and processes that aim to provide a scalable, reusable and potentially networked approach (i.e. interacting with other governance structures) to collective direction setting and decision making.


The collaboratively developed framework ( was based on four key concepts:

Communities – independent and autonomous Data Provider Communities acting collectively;

Roles – the  roles performed by community members to govern (steer) and contribute to (row) AgReFed;

Policies – the materials (including standards, agreed levels of FAIRness for data and other agreements) that are produced through governance mechanisms that guide collective and individual actions; and

Alignment processes – processes that align individual data providers’ data and repositories with agreed collective direction based on FAIR principles and CoreTrustSeal Trusted Repository Requirements.


The AgReFed community collaboratively developed, implemented and tested the governance and stewardship model, moving from project phase to a sustained community. Learnings included the value of incentives, developing joint value propositions, shared vision, and the importance of co-operative principles and policies including decision rights allocation and the maintenance of provider communities’ autonomy and independence.


The learnings and challenges of moving from a project to a self-sustaining distributed data federation, along with the reusable design patterns (including policy, process and role descriptions) presented here, provide a resource for other communities focused on sustained delivery of FAIR and trusted data.


Megan has a broad background in the science research, natural resource management and education sectors. Completing her PhD in vegetation-soil ecology 2014 at Monash University, her passion is in increasing collaboration and knowledge transfer between sectors particularly environment and agriculture. As Research Associate (2018 to present) Megan works with data custodians and communities to help make their data more re-usable with a focus on agricutural and soils data.

Indigenous Data Governance in Australia

Dr James Rose1, Dr Kalinda  Griffiths2, Mr Darren Clinch3

1Indigenous Studies Unit, Melbourne School of Population and Global Health, The University Of Melbourne, 2Centre for Big Data Research in Health, University of New South Wales,  3System Intelligence and Analytics, Department of Health an Human Services, Victoria



Since launching in February 2019, the Indigenous Data Network (IDN) at the Melbourne School of Population and Global Health, University of Melbourne, has implemented a nationwide program to unify the governance of data assets generated by and about Indigenous Australians. Engaging simultaneously with Commonwealth, state, and territory government departments and agencies, Indigenous-controlled research organisations, and the university sector, the IDN has established an integrated set of terms and definitions around which a coherent national approach to Indigenous data governance is being realised.

Australia’s 230-year colonial administrative history has generated an extensive data ecosystem documenting the health, education, employment, policing, environmental management and cultural heritage of Indigenous Australian individuals and communities. Independently, Indigenous-controlled research organisations, including land councils, community health services, native title service providers and think tanks, have emerged as major contributors to this data ecosystem, often with data of a much higher specificity and reliability than that generated by government and universities.

This emerging situation has generated a need to resolve questions around how ownership, custodianship, and stewardship of Indigenous data is defined and regulated. Currently, Australia’s unique lack of formal treaties between Indigenous peoples and colonial authorities means that an approach based on the assertion of data sovereignty is tied to the as-yet unrealised constitutional recognition of Indigenous peoples. In the interim, governance of Indigenous data remains open to negotiation. This paper outlines the IDN’s framework for a national Indigenous Australian data governance strategy, and provides an update on expanding institutional and community partnerships.

Indigenous Data Network:


Dr James Rose

Dr James Rose is Senior Research Fellow with the Indigenous Studies Unit, Melbourne School of Population and Global Health, University of Melbourne, and National Coordinator of the Indigenous Data Network. James specialises in large-scale forensic population modelling, social and kinship network analysis, geographic information systems, and relational database systems design and implementation. He holds degrees in population health and social anthropology, with has over 15 years’ experience working with Indigenous-controlled research organisations across Southeast, Central, and Northern Australia.

Dr Kalinda Griffiths

Kalinda is a Yawuru woman and epidemiologist. She is a Scientia Fellow at the Centre for Big Data Research in Health at the University of New South Wales and a Superstar of STEM with Science and Technology Australia. Kalinda’s work addresses complex health disparities through using existing linked-administrative data. Her research currently addresses issues of quality and the utilisation of Indigenous data with a focus on data governance, measurement, and cancer care and outcomes.

Mr Darren Clinch

Darren Clinch is Senior Analysts with the System Intelligence and Analytics branch of the Victorian Department of Health and Human Services, where he provides GIS and data visualisation solutions for program areas requiring dynamic exploration of government datasets. With a background in modelling, analytics and health policy relating to Aboriginal Victorians, Darren has contributed heavily to a range of major Indigenous health policy and service delivery programs, developing approaches to using Indigenous status in linked data across 30+ datasets.

Harnessing the Digital Revolution to address major societal challenges: a multi-organizational response

Dr Simon  Cox1, Dr Simon Hodson2, Dr Erin Robinson3, Dr Lesley Wyborn1, Dr  Ben Evans4, Dr Tim Rawling5

1CSIRO, Clayton, Australia, 2CODATA, Paris, France, 3Earth Science Information Partners, Boulder, United States of America, 4National Computational Infrastructure, ANU, Canberra, Australia, 5AuScope Ltd, Melbourne, Australia


Developing evidence-based responses to major societal challenges depends on integration of data from globally distributed, heterogeneous sources. The International Science Council’s (ISC) Action Plan for 2019-2021 identified data-driven interdisciplinarity as one of twelve projects of critical importance to science and society, and one of two components of the ‘Digital Revolution Domain of Impact’ (

The United Nations (UN) Sustainable Development Goals (SDGs) are an exemplary challenge, with target deliverables due by 2030. Making significant progress on data integration in support of the SDGs will require global collaboration, with specific attention being paid to technology and expertise transfer between the Global North and the Global South.

CODATA and the International Science Council (ISC), UNESCO, United Nations Office for Disaster Risk Reduction (UNDRR), Earth Science Information Partners (ESIP), CSIRO, the Research Data Alliance and multiple international research infrastructures have separately initiated work on various aspects of both technical and organizational solutions needed for effective data-driven interdisciplinarity. Each is investigating many of the key topics such as the application of the FAIR principles, disaster data standardization, ‘fitness for use’ criteria, Operational Readiness Levels, alignment of metadata specifications, risk assessment, social awareness, organizational drivers and constraints etc. Combining efforts will more effectively contribute towards solutions needed to provide an evidence-based framework to accelerate the achievement of the SDGs by 2030. As a focal point, we are targeting three case studies: resilient cities, infections diseases and disaster risk reduction, to explore how these organisations can better collaborate on the overall problem.


Simon has been researching standards for publication and transfer of Earth and environmental science data since the emergence of the world wide web. He is principal- or co-author of a number of international standards, including Geography Markup Language, and Observations & Measurements, that have been broadly adopted in Australia and Internationally. The value of these is in enabling data from multiple origins and disciplines to be combined more effectively, which is essential in tackling most contemporary problems in science and society. His current work focuses on aligning science information with the semantic web technologies and linked open data principles, and the formalization, publication and maintenance of controlled vocabularies and similar reference data.

Simon was awarded the 2006 Gardels Medal by the Open Geospatial Consortium, and was selected to present the 2013 Leptoukh Lecture for the American Geophysical Union.

Inverting the Scholarly System: Community Ownership, Governance and Transparency for how we measure Contribution and Participation

Dr Richard Hosking1

1Curtin University, Perth, Australia


Despite a vast quantity of work, and the development of entire disciplines devoted to measuring scholarship and its dissemination, we remain fundamentally unable to ask basic questions about it. At the core of this is a lack of detailed examination of the effectiveness of communication and community building in favour of a narrow observation of countable signals.

The Curtin Open Knowledge Initiative (COKI) team is developing a broad program of work on the theme of ‘Open Knowledge Institutions’. The goal of our project is to develop tools and data to enable universities to understand how effectively they are operating as open knowledge institutions, enabling strategic change in higher education and research.

To date we have collected a multi-trillion point dataset, which is updated daily and has a global reach. Within these hundreds of terabyte of data we cover traditional research outputs, their linkages, institutional websites, social media, library feeds and crucially, information on the diversity of who is making this knowledge.

This talk is also an invitation to discuss the potential of a community owned and governed resource, both in the support of policy-ready tools and datasets and to also enable cutting edge research on the practice of research into the future.


Richard is the Lead Data Scientist for the Curtin Open Knowledge Initiative, with one foot in the Curtin Institute for Computation and the other in the Centre for Culture and Technology. He holds a PhD in Computer Science from the University of Auckland and has worked across both Academia and Industry building data intensive and machine learning systems.

Collaboration in computation at Curtin University: Investigating human activity recognition and joint kinematics in ballet

Dr Kathryn Napier1, Dr Kevin Chai1, Ms Danica Hendry2, Dr Richard Hosking1, Dr Amity Campbell2, Prof Leon Straker2, Dr Luke Hopper3, Prof Tele Tan4, Professor Peter O’Sullivan2

1Curtin Institute For Computation, Curtin University, Perth, Australia, 2School of Physiotherapy and Exercise Science, Curtin University, Perth, Australia, 3Western Australian Academy of Performing Arts, Edith Cowan University, Perth, Australia, 4School of Civil and Mechanical Engineering, Curtin University, Perth, Australia



The Curtin Institute for Computation (CIC) and the School of Physiotherapy and Exercise Sciences have collaborated to develop a novel machine learning based approach to accurately measure ballet dancer training load and joint kinematics in a real-world training setting. Currently training load is estimated from dancer’s written diary entries, while the investigation of joint kinematics requires sophisticated and expensive laboratory based optical motion capture systems. By combining expertise in both physiotherapy and computation, we have developed a novel methodology utilising affordable wearable sensors that can accurately measure dancer training load and joint kinematics in real-world ballet classes.


Two separate studies were performed on female ballet dancers fitted with wearable sensors. The first study developed a human activity recognition (HAR) convolutional neural network (CNN) model to classify six different jumping and leg lift ballet movements. The second study developed a joint kinematics recurrent neural network (RNN) prediction model for leg lift movements predicting thigh angle as a measurement of leg height.


The HAR CNN model achieved 83% classification accuracy, and the joint kinematics RNN model achieved peak thigh angle predictions with a mean error of 5.9% based on the results obtained from the experimental datasets.


The models developed were robust enough to identify jumping and leg lifting movements and to identify how often and how high dancers lifted their legs during leg lifts in a real-world ballet training class. This methodology will assist in providing further insight into the factors influencing a dancer’s pain and injury risk.


Kathryn Napier is a Senior Data Scientist with the Curtin Institute for Computation (CIC) at Curtin University. The CIC was founded in 2015 to initiate and foster collaborative, interdisciplinary research that applies computational methods. Kathryn collaborates with Curtin University researchers and project partners to assist with data, computational, analytics and visualisation problems. Prior to joining the Curtin Institute for Computation in late 2018, Kathryn worked as a Research Associate at the Centre for Comparative Genomics at Murdoch University in the fields of Bioinformatics and Health Informatics

Unlocking industry data in research/industry partnerships and working towards a FAIR data future

Mr John Hille1, Mr Sam Bradley1

1CSIRO, Kensington, Australia


To produce meaningful research outcomes in partnership with industry there is a requirement for industry to share data with their prospective research partners.  Industry partners have a tendency to be risk averse and protective of their data and this is often at odds with Findable, Accessible, Interoperable and Reusable (FAIR) data principles that are more conducive to quality research outcomes.

This creates particular challenges for researchers whose projects depend on industry data which is further exacerbated by the different approaches different industry partners may take to data risk assessment along with the differing approval processes that each may have which can often be difficult to navigate for an individual researcher.

The Australian Research Council (ARC) Training Centre for Transforming Maintenance through Data Science (CTMTDS) has worked with its industry partners to formulate a data life cycle framework with associated tooling that incorporates a standard data risk assessment, an industry standard data repository along with appropriate approval mechanisms to satisfy the more stringent industry requirements and alleviate industry concerns around data management, data governance and data security.

While some of the data governed by the framework may not ultimately satisfy all of the requirements to be considered truly FAIR, by designing a framework from the outset with FAIR principles in mind the barriers for data to be made available to the wider research community can be significantly reduced.


John Hille is a Software Developer with over a decade of professional experience working primarily with web, collaboration and cloud technologies. Currently he works as a support engineer within CSIRO’s Mineral Resources business unit working on a range of research and industry projects.

Developing an Effective Research Data Culture

Dr Rhys Francis1, Ms Ai-Lin Soo2, Dr Andrew Jenke4, Mr Luc BetBeder-Matibet5, Dr Stephen Giugni3, Dr Steve Quenette2

1eResearch Futures P/L, Diamond Creek, Australia, 2Monash University, Melbourne, Australia, 3Univesity of Melbourne, Melbourne, Australia, 4University of Sydney, Sydney, Australia, 5Univesity of New South Wales, Sydney, Australia


Research intensive universities are working to articulate the nature of an affordable and effective research data culture, under the banner of the Research Data Culture (RDC) group. The RDC has observed an exponential increase in research data that is not matched by improvements in technology and which is also associated with increased labor costs of support and management. A model for best practice is sought, where expenditure on research data can be optimised against the motivators and measures of research reputation, quality and impact.

RDC organises meetings at a regional and national level to progress its interest and at conferences where possible. While the motivation came from four research intensive universities, identifying common ground, outreach is currently underway to grow the set of participants.

An approach to Research Data Management Plans, termed RDMP-2.0, is needed which:

– Involves all support “pillars” within a university, including archives, eResearch, library, IT, records and the research office

– Covers the “Yin and Yang” of data: (preservation, sharing and reuse) and (resourcing, sensitivity and disposal).

The RDC engages all “pillars” in its meetings fostering those engagements within institutions. Examples of current best practice are being collated.

The Australian Research Data Commons has expressed an interest in how the RDC agenda would intersect with national investments in “distributed national collections” and a “National FAIR safety net”. The discussion between the six “pillars” on these questions is underway.

This talk will cover the background to the RDC, its learnings and the responses to the ARDC’s questions.


Rhys spent the first decade of his career as an academic researcher in parallel and distributed computing. The next decade and a half included roles as a senior principle researcher, research programme manager and strategic leader in information and communication technologies in the Commonwealth Scientific and Industrial Research Organisation (CSIRO). From 2006 Rhys worked within the Australian Government’s National Collaborative Research Infrastructure Strategy developing its investment plan in eResearch and subsequently as the Executive Director of the Australian eResearch Infrastructure Council. In that role he shaped the foundation of Australia’s national e-infrastructure landscape visible today. Since then through a series of engagements he has continued to work to harness advancing information and communication technologies to the benefit of Australian research.

Developing Capability in Bioimaging Research Software Engineering with National Programs and Partnerships

Dr Paula Andrea Martinez1,2,3,4

1National Imaging Facility, Brisbane, Australia, 2The University of Queensland, Brisbane, Australia, 3Centre for Advance Imaging, Brisbane, Australia, 4The Characterisation Community, , Australia


The  Characterisation Community is a project that underpins national capability by connecting national infrastructure and expertise. The project lead partners are three Australian national institutions (NIF, Microscopy Australia and ANSTO), plus five Australian Universities (UQ, UoM, Monash, UWA, SydUni and UoW). This project is co-funded by the Australian Research Data Commons (ARDC).

The Characterisation Community cases data-intensive science, in which informatics infrastructure, expertise and best practice are essential to turning data into new discoveries. As a collective, this community has been planing and implement strategies to overcome these challenges (first drafted strategy 2017).

Last year’s program (2019) developed and improved effective and networked relationships across Australia. By advocating a collaborative climate of innovation, we managed a series of events to outreached users at the beginner, intermediate and advanced levels. The event highlighted for this presentation is an intermediate-advanced tutorial, titled: “Automating Neuroimaging Workflows” and the target is research software engineers, and neuroscientists.

With this training, we accomplished goals and lesson learned which are worth sharing with the community:

– Automating workflows are still a key requirement from the community to advance science. This training did not only provide examples of existing workflows but also encourage attendees to develop their own workflows with the community.

– Giving participants accessibility to computational capacity. Containers and access to the Characterisation Virtual Laboratory (CVL).

– Build community and collaborations amongst practitioners and end-users by providing opportunities for professional development. We delivered in-person training connecting members of the Victorian Biomedical Imaging Capability (VBIC), and guests.


Dr Paula Andrea Martinez is leading the National Training Program for the Characterisation Community since 2019. She works for the National Image Facility (NIF) which is present in ten Australian universities and research institutes. Last year she worked at ELIXIR Europe coordinating the Bioinformatics and Data Science training program in Belgium and collaborated with multiple ELIXIR nodes in the development of Software best practices. Her career, spanning Sweden, Australia and Belgium nurtured her experience in Bioinformatics and Research Software development for complex and data-intensive science in the last 6 years. She started a career in Computer Science, later on, interested in research methods development and now outreach and advocacy in data and software best practices.


Infrastructure as the ingredient of interdisciplinary team success

Dr Alex Codoreanu1

1Swinburne University, Astronomy Data and Computing Services


Contemporary analytics, insights and data science teams are composed of and work within multidisciplinary environments. Their success and longevity depends on how well they integrate within existing structures, how quickly they can add value and on how they can build up on past success stories to deliver new ones. As more and more academic and government entities begin to deal with Big Data challenges, understanding how to integrate and grow a data science team becomes a crucial component of delivery.

In this talk, I will discuss my experience working with multidisciplinary teams within academia, local state government and international corporate collaborations. I will highlight the necessary digital and human infrastructure ingredients for these transitions and will describe, through my real-world experiences, how a combination of flexible data/computing architecture paired with a horizontal reporting structure can lead to interdisciplinary success outcomes.


Dr. Codoreanu is a research scientist with experience in designing and implementing distributed architecture solutions. He has experience in creating custom databases, cleaning large datasets and developing custom software products. He is interested in natural language processing techniques, language semantic analysis, graph databases, GPU acceleration and machine learning algorithms.

Harnessing the Data Revolution: The US National Science Foundation’s “Big Ideas”

Dr Alexis Lewis1, Dr Eva Campo1

1National Science Foundation, Alexandria, USA


In 2017, the US National Science Foundation (NSF) published “Ten Big Ideas” for Future Investment. Harnessing the Data Revolution is a national-scale activity designed to enable new modes of data-driven discovery to address fundamental questions at the frontiers of science and engineering. The HDR vision is realized through an interrelated set of efforts in the foundations of data science, data-intensive science and engineering, data cyberinfrastructure, and education and workforce development.

We will discuss the NSF programs that support efforts around Harnessing the Data Revolution, spanning the range from large, nationwide Institutes for Data-Intensive Research in Science and Engineering to individual efforts to support community engagement across all scientific disciplines. Of particular interest is the potential to collaborate with agencies outside of the US to broaden the impact of these activities and engage the worldwide scientific research and education communities.


Alexis Lewis is the Program Director for Data Initiatives in the Division of Civil, Mechanical and Manufacturing Innovation, Directorate for Engineering, at the US National Science Foundation (NSF). Her programs support research and community activities that promote, develop, and employ data-driven approaches to scientific discovery, robust data infrastructure, and sound data management practices. She holds SB, MSE and PhD degrees in Materials Science and Engineering, and worked as a Materials Research Engineer prior to joining NSF in 2014.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd