Helping researchers who work on ‘Understanding the Earth’ to better understand the new FAIR publication requirements

Dr Lesley Wyborn1, Julia  Martin2, Dr Mingfang Wu3, Shelley  Stall4, Dr  Natasha  Simons5, Dr Ben  Evans1, Dr  Adrian Burton2, Dr Tim Rawling6

1NCI, ANU, Acton, Australia, 2Australian Reserach Data Commons, Canberra, Australia, 3Australian Reserach Data Commons, Melbourne, Australia, 4American Geophysical Union, Washington, United States of America, 5Australian Reserach Data Commons, Brisbane, Australia, 6AuScope Ltd, Melbourne, Australia

 

A diverse range of researchers work on ‘Understanding the Earth’: studying phenomena from the outer atmosphere to the inner core. Although scholarly publications from their research are based on datasets, software, and physical samples, a frustrating issue has been the access to many of their input artefacts. This access is critical to ensure the integrity of published research and facilitating their reuse.

In 2017, a grant from the Laura and Arnold Foundation enabled the American Geophysical Union (AGU) and other partners (including AuScope, National Computational Infrastructure (NCI), Australian Research Data Commons (ARDC)) to significantly improve the interconnection between literature and datasets, software and samples, based on the Findable, Accessible, Interoperable, Reusable (FAIR) principles.

The project mobilized a community of more than 300 international stakeholders from publishers, funders, repositories, researchers, professional societies, etc to ensure that input data, physical samples, and software are accessible and referenceable as first-class research products in the modern research ecosystem. The project developed a Commitment Statement that reflects the distinct stakeholder perspectives, and defines goals for each that collectively support open and FAIR principles. Over 100 publishers, repositories, organizations and individuals have now signed.

To assist Australian researchers to meet these new requirements, ARDC have published a set of online resources to help citation and unique identification of data, software and physical samples. ARDC have also established a support network at research institutions across Australia. Making more resources open and FAIR will also assist in enabling transparent research for those who work on ‘Understanding the Earth’.


Biography:

Lesley Wyborn is an Adjunct Fellow at the National Computational Infrastructure and RSES at ANU and works part time for the Australian Research Data Commons. She previously had 42 years’ experience in scientific research (geochemistry and mineral systems research)  and in geoscientific data management in Geoscience Australia from 1972 to 2014. In geoinformatics her main interests are developing international standards that support the integration of Earth science datasets into transdisciplinary research projects and in developing seamless high performance data sets that can be used in high performance computing environments. She is currently Chair of the Australian Academy of Science ‘National Data in Science Committee’ and is on the American Geophysical Union Data Management Board. She was awarded the Australian Government Public Service Medal in 2014, the 2015 Geological Society of America Career Achievement Award in Geoinformatics and the 2019 US Earth Science Information Partners Martha Maiden Award.

 

Improving the Discovery of Environmental Research Data With Recommendations

Dr Anusuriya Devaraju1, Uwe Schindler1, Dr Michael Diepenbroek1

1MARUM, University Of Bremen, , Germany

 

With an increasing number of open research data in data portals, users found it difficult to discover new, relevant, and interesting datasets from these portals. Conventional data search, for example using a full text search may produce either too broad or too narrow results. It depends on the richness of metadata and how well a data search engine is fine-tuned to generate desired results. Further, this search mechanism is primarily suitable for users who can clearly express their information needs, but is inadequate for users who simply want to browse or discover potential datasets for developing or testing their applications. The PANGAEA data portal holds more than 380000 environmental datasets published with DOIs. We present a data recommender system to improve the discovery of PANGAEA datasets. The system was developed based on text analytics and usage mining. It produces two types of recommendations, ‘Datasets with similar metadata’ and ‘Users that were interested in this dataset were also interested in’. The first type of recommendation resembles text-matching search; however, it utilizes the metadata of a target dataset (e.g., title, author, location, time, publication, etc.) to identify its similar datasets. The second type of recommendation uses user interactions such as data page views, clicks and downloads to produce ‘new’ data recommendations. We evaluated the system online, its results highlight the effectiveness of the system in improving user engagement (e.g., click-through rates) on the data portal.


Biography:

Dr. Anusuriya Devaraju is a data scientist holding a PhD in Geoinformatics from the University of Muenster, with a specialization in semantic integration of geospatial information. She has a Master’s degree in System Design for Internet Applications from the Newcastle University, UK. Currently, she is working at the Centre for Marine Environmental Sciences (MARUM), University Bremen, where she is responsible for developing and implementing techniques based on data mining and machine learning to improve the discovery of PANGAEA datasets. Prior joining the research centre, she worked at Commonwealth Scientific and Industrial Research Organisation (CSIRO) Australia and developed information models and tools to improve the discovery of CSIRO research assets. During her postdoctoral time at Forschungszentrum Jülich, she contributed significantly to TERENO data management and integration.

Making Earth and environmental science data accessible via machine-to-machine services: where are they at and where are they going.

Dr. Adrian Burton2, Mr James Gallagher1, Mr. Joseph Abhayaratna6, Dr. Ben Evans3, Dr. Lesley Wyborn3, Dr. Justin Freeman5, Mr. Aaron Sedgman7, Dr. Gareth  Williams4, Dr. Kelsey Druken3, Ms Melanie  Barlow2, Dr. Mingfang Wu2

1OPeNDAP, U.S.A
2Australian Research Data Commons, Australia
3Australia National Computational Infrustructure, Canberra, Australia
4CSIRO, Australia
5Bureau of Metrorology, Melbourne, Australia
6PSMA , Canberra, Australia
7Geoscience Australia, Canberra, Australia

 

Machine-to-machine data services have become an integral part of the research, government and industry sectors. They provide automated functions for the creation, access, processing and analysis of data. The development of data-focused services is steadily increasing in Australia, across the NCRIS capabilities, CSIRO, and government agencies all of whom are moving to more formal data publishing through services, and making their data findable, accessible and interoperable to increase reusability for wider communities.

This one-day workshop will cover:

  • Current usages of a suite of web data services technologies including DAP (i.e., THREDDS, OPeNDAP, Hyrax, ERDDAP or PYDAP), CHORDS, ThingSpeak, etc;
  • Protocols and standards for service discovery and use (e.g., OGC standards, FDSN Web Service Specifications, OpenAPI) and
  • Metadata that describes them (e.g., ISO19115, ISO19119); and Horizon scans (e.g., the redesign of OGC web service standards to be more resources oriented and use OpenAPI/Swagger toolsets).

The aim is to then ask how to more efficiently use data services to meet both the challenges of today and those of the future.

This workshop will be of interest to:

  • Data service providers for exchanging latest practices and technologies of providing data services;
  • Data service consumers for raising data usage requirements and exploring how to make best use of data services; and
  • Technology and standard communities for communicating and getting feedback from both providers and communities.

Note: The above discussed technologies or standards also apply to data beyond the earth and environment domain, other communities are more than welcome.


Biography:

To be advised

Building an Enterprise Research Data Management ecosystem focusing on improving researcher productivity in digitally driven science.

Mr John Morrissey1, Ms Cynthia Love1

1CSIRO IMT, Canberra, Australia

 

For the last 10-12 years the Australian eResearch community has been working on how to manage the impending research data deluge. Australian research organisations have responded to this challenge in a variety of ways depending on available resourcing and internal cultures and ideology. In CSIRO we adopted the approach of beginning our investments by creating robust and secure enterprise data portal (The CSIRO Data Access Portal) and storage infrastructure. The cultural issues of organisational policy and governance were planned to follow after the tools to enable good practice were built.

In the last 1-2 years we have seen a real change in demand from both researchers and the CSIRO executive for not only well-defined governance structures but also better data management platforms that reach deeper into the research projects and facilities across CSIRO.

In this workshop will talk about and invite feedback on 6 major data management programs currently underway in CSIRO including:

  1. Managed Data Ecosystem Update
  2. Research Data Planner tool
  3. CSIRO Data Governance update
  4. ORCiD, IGSNs and other identifiers
  5. Collection Management system
  6. eLab Notebooks

We must make it easier for researchers to deal with the data deluge on a day to day basis, so they have more time to do core research. To do this we must develop a smarter more interconnected data management ecosystem that is less intrusive on researcher workflows over time.


Biography:

To be confirmed

A cloud-based Well Log Database Prototype

Mrs Chitra Viswanathan1, Dr  Irina  Emelyanova1, Dr Ben Clennell1, Ms Stacey Maslin

1Csiro, Kensington, Australia

 

Geoscience data including seismic, well log, sensor and core measurements are fundamental for Petroleum exploration. Due to recent advancements in sensor and computer technologies, the volume of this data is constantly increasing. Having a unified repository of this data of various types, structure and complexity is crucial for maintaining data integrity. This study addresses petroleum exploration data integrity issues. Current trends in data management technologies and current data practices in Petroleum Geoscience are explored and a practical data management solution to facilitate data access, storage and sharing is recommended. A prototype of a well log database was developed to demonstrate an example of a common repository for downloaded and sanitized data to avoid duplicate downloads from public websites by petrophysicists and make data use more efficient within a particular organisation. The prototype was developed using cloud-based technology and the PAWSEY supercomputing facility (a joint venture of CSIRO with Western Australian universities) for storing both the raw (.las and .DLIS files.) and the sanitized well log datasets from Bonaparte Basin.  PostgreSQL database was used to store the sanitized well log data, metadata and links to raw data. PostgreSQL architecture was selected for its ability to support advanced data types (arrays, JSON etc.), plug in to languages like Python, and link to PostGIS, a spatial database extender. A web-based graphical user interface was developed to view, upload and download well log data.  In addition, meaningful metadata standards were established in collaboration with expert petrophysicists.


Biography:

Mrs Chitra Viswanathan has a background in Mathematics and Computer Science. She has worked as a software developer in CSIRO energy division for more than 15 years. She developed a   suite of software tools  to handle  sand management issues for a major  oil and gas company  who have successfully deployed the software tools in their studies.

She is currently exploring new technologies for managing geoscience data within the energy business unit.

Barnacles to browsers: full-stack coastal monitoring

Ms Sharon Tickell1, Mr Jonathan Hodge1, Mr Erin Kenna1, Mr Daniel Wild1, Mr  Geoffrey Carlin1, Mr Fry Gary1, Mr Brendon Dando1

1CSIRO, St Lucia, Australia

 

Combining the increasing ubiquity of internet-enabled, solar powered sensors with an apparently limitless need for observational data to feed into coastal models, the CSIRO Oceans and Atmosphere Coastal Informatics team has implemented a full-stack coastal monitoring system that can deliver water quality data collected by remote sensors all the way through to public facing web applications.

This monitoring stack began life with three sensor deployments on the Logan and Albert rivers as part of the TERN supersites program. It now has more than 5,000 data and metadata streams contributing water quality observations from locations all along the  Queensland coast.

The implementation of this monitoring stack presented several challenges, from the logistical issues that come with maintaining sensor hardware in often remote tropical areas, to difficulties visualising time-series data in a memory-limited, browser-based web application.

I will give an overview of our current and planned technology choices, of the types of data that are currently available, and will discuss what it might take to expand the current coastal monitoring stack into a production-grade data service that is able to feed predictive environmental models.


Biography:

Sharon is a software engineer who specializes in distributed application development, and DevOps for research data services.  She has been a system administrator for several ongoing projects, including eReefs, and the ACEF and AusCover facilities of TERN.

Make the connection with persistent identifiers

Dr Adrian Burton1, Dr Amir Aryani1, Ms Gerry Ryder1

1Australian National Data Service

 

Increasingly, the research community, including funders and publishers, is recognising the power of ‘connected up’ research to facilitate reuse, reproducibility and transparency of research.

Persistent identifiers (PIDs) are critical enablers for identifying and linking related research objects including datasets, people, grants, concepts, places, projects and publications.  The Australian National Data Service (ANDS), in collaboration with other national agencies, is involved in global initiatives to exploit the power of PIDs.

Scholix, an initiative of the Research Data Alliance, provides links between scholarly literature and research data as well as between data and data.  These links significantly aid the scientific method by improving discovery of and access to related knowledge and underpinning observations.

The Research Data Switchboard connects datasets and related information across research data repositories and infrastructures using information about co-authorship and jointly funded projects.  The data from the Switchboard software is captured in a distributed network of scholarly works called Research Graph, this data is also available in known graph data files such as GraphML for analysis and visualisation.

This presentation will briefly describe the ANDS data connections strategy and the identifier types involved such as DOI, ORCID and RAID.  We will demonstrate Scholix and Research Graph and explain how you can ensure your research outputs are ‘connection ready’.

For researchers and research organisations to take advantage of the emerging global and distributed research information ecosystem, persistent identifiers are essential for globally standard references  to datasets, publications, people, grants, scientific concepts, places, projects etc.


Biography:

Dr Adrian Burton is Director, Services at the Australian National Data Service

ReDBox 2 and the Data Life Cycle

Mr Gavin Kennedy1, Dr Peter Sefton2, Mr Andrew Brazzatti1

1Queensland Cyber Infrastructure Foundation, St Lucia, Australia,

2University of Technology Sydney, Ultimo, Australia

 

Research data management is key to good research and is required by policies, codes of practice and research funding agencies, but is often seen by researchers as a compliance matter, with little immediate benefit. Increasing the functionality of research data management tools helps drive uptake while delivering value to researchers and administrators alike.

In this presentation we will describe and demonstrate recent collaborative work between UTS and QCIF to improve the functionality of the ReDBox software to provide an active catalogue of integrated provisionable services to researchers.

We’ve added the concept of a workspace to the existing ReDBox Data Management Planning (RDMP) tool. Workspaces can be simple storage, such as a file share, a project on a version control system such as Gitlab or Github, an electronic lab notebook; any place research data is created or consumed.

Workspaces integrate with the Provisioner, allowing researchers to request, find and link workspaces from a growing variety of services. The Provisioner ‘injects’ metadata into each workspace so that it can be found and automatically archived at the appropriate time. This provides an immediate benefit to researchers (a new workspace) with clear follow-on benefits, including reduced work to manage depositing or archiving datasets.

With these features ReDBox supports the research data life cycle through enhanced data management services, allowing ReDBox to fulfil the role of a single metadata repository and data archive for multiple institutions to publish to, and can scale up to the level of a national research data archive.


Biographies:

Dr Peter Sefton

Peter Sefton is the Manager, eResearch Support at the University of Technology, Sydney (UTS). Before that he was in a similar role at the university of Western Sydney (UWS). Previously he ran the Software Research and development Laboratory at the Australian Digital Futures Institute at the University of Southern Queensland. Following a PhD in computational linguistics in the mid-nineties he has gained extensive experience in the higher education sector in leading the development of IT and business systems to support both learning and research.

At UTS Peter is leading a team which is working with key stakeholders to implement university-wide eResearch infrastructure, including an institutional data repository, as well as collaborating widely with research communities at the institution on specific research challenges. His research interests include repositories, digital libraries, and the use of The Web in scholarly communication.

Gavin Kennedy

Gavin Kennedy is an IT research and solutions expert and  is  the head of  Outreach  and Engineering Services  at the  Queensland  Cyber  Infrastructure  Foundation  (QCIF).  Gavin  leads  QCIF’s  Data Innovation Services team, the  key  developer  of ReDBox, the popular research data management and publishing platform. Gavin is a passionate advocate for Open Source platforms to support open research and the FAIR data principles. Gavin has over 30 years IT experience in organisations as diverse as CSIRO, General Electric and British Telecom.

Mauritius Ocean Observatory platform for Marine Spatial Planning

Mr Erin Kenna1, Ms Sharon Tickell1, Mr Jonathan Hodge1

1CSIRO, Brisbane, Australia

 

The Department for Continental Shelf, Maritime Zones Administration and Exploration, with the expertise of CSIRO Oceans & Atmosphere, has developed an Ocean Observatory Database platform under the Indian Ocean Rim Association (IORA) project “Developing an Enhanced Ocean Observatory in support of Ocean Exploration and Development”. The Ocean Observatory platform is designed to support the Marine Spatial Planning initiative of the Republic of Mauritius by providing a platform to collect, store, organise and provide access to spatio-temporal data relevant to ocean exploration and development. The platform will ensure that data meeting the needs of industry and government authorities can be easily accessed and analysed. By providing relevant information, the database will also help to sustainably manage the maritime zones of Mauritius through informed policy decisions.

The Ocean Observatory platform uses GeoNode, an open source Geospatial Content Management System, which allows data to be loaded into a geospatial database alongside connected metadata and document resources. The relationships between data, metadata and documents are maintained to ensure that it is easy for end users to discover and access data resources. In addition, GeoNode includes a web-based interface to allow simple discovery and management of spatial data and metadata as well as interactive mapping. GeoNode leverages existing security frameworks, is scalable and includes an administration console which makes it easy to manage resources and front end settings.

This project has developed Ansible playbooks for automated, production-grade installation and configuration of the Ocean Observatory as a template for other developing nations.


Biography:

Erin Kenna is a spatial scientist and data manager with the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Brisbane, Australia. Erin has over 12 years’ experience working with environmental and spatial data for State and Federal government and the private sector. Erin joined CSIRO in 2016 and is responsible for quality assurance, collation and synthesis of data on a range of CSIRO projects. Prior to joining CSIRO Erin worked with the Queensland State government for 10 years on programs assessing biodiversity and developing data and assessment frameworks to integrate environmental science into products for management, spatial planning and policy purposes.

Implementation of a Hybrid Recommender System in CSIRO’s Data Access Portal

Dr Anusuriya Devaraju1, Mr Dominic Hogan2

1CSIRO Mineral Resources, Perth, Australia,

2CSIRO, Brisbane, Australia

 

With an increase in the rate of data publication on CSIRO’s Data Access Portal (DAP) comes the challenge of helping users to discover relevant datasets.  Research data repositories typically allow searching via keywords and faceted navigation, where users benefit from an existing familiarity with the content, whether browsing by subject or navigation approaches.  Research was conducted into the feasibility of a hybrid recommendation approach, presenting users with recommended datasets.  The approach leverages content-based similarity and usage patterns, tuned to a feature weighting model obtained through a survey involving real users.  The results of the model were then evaluated in a user study, indicating which ranks of recommended results were deemed relevant.  Following this research, the model was implemented in the DAP and released to the public in March 2018.  To our knowledge this is the first implementation of such a recommendation system for research datasets.  We present a preliminary analysis of the use of these recommendations by general users of the DAP, discussing the proportion of users following recommendations and their activity compared to the wider population of DAP users.


Biography:

Dominic Hogan is a business analyst at the Commonwealth Scientific and Industrial Research Organisation.  From 2012 to 2017 he worked as a data librarian supporting data management across CSIRO, and has been heavily involved in development work for CSIRO’s Data Access Portal.  In his work he has supported work in various research domains, including terrestrial ecology, marine research, computer visualisation and materials science.  Recently he has contributed to a project implementing a recommender system for research datasets in CSIRO’s data repository.

ABOUT AeRO

AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd