Creating Value with Data and AI in Healthcare

Milan Petkovic1

1Head of the Data Science department, Philips



Nowadays our healthcare systems face serious challenges starting with ageing population which comes with a high percentage of chronically ill people often with multi-morbidities and on the other hand health work-force/place shortages at the times we are facing serious epidemics. Data and Artificial Intelligence (AI) technologies can lead the way to the transformation of the healthcare sector necessary to deal with these difficult challenges. This is not merely a future vision, but a reality, which we are shaping today. In this talk, I will address specific challenges in applying data and AI technologies in healthcare, show several proof points of the beneficial applications and provide an outlook to the coming initiatives and partnerships in these fields.


Prof. Dr. Milan Petković is the head of the Data Science department at Philips, which conducts innovation projects in the domain of AI and data science in healthcare.

He is also a part-time full professor at the Eindhoven University of Technology. Prof. Petković is also a vice president of the BDVA association and he serves in several advisory groups to the European Commission on data sharing and security subjects.

On behalf of Philips he participated in several standardization initiatives and led several Dutch national and EU projects. He recently published a book on Data Science in Healthcare.

Scalable Distributed Infrastructure for Data Intensive Science

David Abramson1

1University of Oxford



Modern research intensive organisations face challenges storing and preserving the increasing amounts of data generated by scientific instruments and high performance computers. Data must be delivered in a variety of modes depending on the end use, ranging from Web portals through to supercomputers. Building infrastructure to meet this need is complex and expensive. There is a need for mechanisms that support both managed and unmanaged data in a coherent and scalable way, often over a physically distributed multi-campus environment.

In this talk I will discuss the ways we are delivering such infrastructure at the University of Queensland. Long term hierarchical storage, and many of the computing systems, are housed in a commercial Tier 3 data centre 20 kms from the main campus in St Lucia. Some high performance machines and desktops, and all scientific instruments, are housed on campus. University researchers work with local, national and international collaborators, requiring the need to share data securely and efficiently across a variety of scales. Our COTS based “MeDiCI data fabric” provides seamless access to data in such an environment. In order to improve standards of management, curation and preservation of data, a locally developed meta-data management service called RDM provides a single point of access for storage requests. Recent work on the CAMERA environment links unmanaged collections to managed repositories in a flexible and efficient manner. Finally, the fabric delivers data to a range of commodity and novel computing platforms such as the FlashLite data intensive cluster and the Wiener GPU supercomputer.


David has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. Prior to joining UQ, he was the Director of the Monash e-Education Centre, Science Director of the Monash e-Research Centre, and a Professor of Computer Science in the Faculty of Information Technology at Monash. From 2007 to 2011 he was an Australian Research Council Professorial Fellow. David has expertise in High Performance Computing, distributed and parallel computing, computer architecture and software engineering. He has produced in excess of 200 research publications, and some of his work has also been integrated in commercial products. One of these, Nimrod, has been used widely in research and academia globally, and is also available as a commercial product, called EnFuzion, from Axceleon. His world-leading work in parallel debugging is sold and marketed by Cray Inc, one of the world’s leading supercomputing vendors, as a product called ccdb. David is a Fellow of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronic Engineers (IEEE), the Australian Academy of Technology and Engineering (ATSE), and the Australian Computer Society (ACS). He is currently a visiting Professor in the Oxford e-Research Centre at the University of Oxford.

The Power of the Prototype

Annie Burgress1

1Earth Science Information Partners



Imagine your fright if during the course of a mature research project you needed to learn a new programming language, learn what a workflow management system is and implement one, and do all of this in a thing called “the cloud”. While daunting, it is often the messaging researchers hear, and for good reason. Open, reproducible, and scalable computing is the most-efficient means to a better scientific understanding of our planet. But the all-at-once shift in researchers’ computing methodology often falls flat. However, this talk will provide an alternative. Through small projects, prototyping, and community input, researchers can experiment in a low-stakes environment, while gaining the skills that will enable them to participate in the growing open, scalable Earth science computing revolution.


Dr. Annie Burgess’ career has bridged Earth Science and informatics. During her graduate work at the University of Utah, Annie managed, analyzed, and distributed an immense amount of data related to her research in snow hydrology. The primary data product created during her Ph.D. work is currently distributed through NASA/JPL.

As a post-doc at the University of Southern California, she developed software for the unique needs of the Polar science community. Annie knows the power of connecting Earth scientists with technical and collaborative infrastructure.

At ESIP, she utilizes her technical savvy and networking skills to run their innovation program, also known as the ESIP Lab.

Pangeo, Processing and Providence: Key Python technologies that you should try.

Mr Nick Mortimer1

1CISRO Ocean and Atmosphere, Crewly, Australia


Since making the move Python and discovering the Pangeo community, Nick has been on a journey of collaboration, working with the National Center for Atmospheric Research in Boulder Colorado and the Met Office’s Informatics Lab in Exeter Uk.

In this workshop, Nick hopes to present ways of working in Python using open source community tools, that encourage collaboration.

Tired of working with old closed source tools? Ready to embrace the Python eco-system? This workshop is designed to introduce some key Python technologies that will help you deliver analysis-ready datasets combined with scalable processing ready to tackle just about any size of dataset.

This workshop is designed to give you what you need to implement workflows in Jupyter notebooks with a focus on scalability and providence.

First, we will start with an introduction to the Pangeo environment ( :

Jupyter lab: Web page Python-based processing in the cloud

Dask:  Write scalable analysis code in python

Xarray: N-Dimensional labelled arrays and datasets in Python, with a focus on Zarr and cloud storage

Intake: A Lightweight set of tools for loading and sharing data in data science projects

Papermill: Parameterize and run your Jupyter notebooks

Followed by examples covering some use cases from this list:

  1. Can you please process this 50Gb CSV file for me?
  2. I have 18,000 Argo float netCDF files that I want to get into aggregate
  3. I have some legacy FORTRAN and I’d like to be able to use it with scalable Python tools
  4. I’d like to create and share an analysis read dataset using Intake


Nick Mortimer stopped using Matlab and moved to Python nearly five years ago after a conscious decision to change his career path from endlessly cleaning csv files to capturing and preparing analysis-ready datasets in near real-time.

Improving Shareability and Maintainability for Short Training Workshops

Dr Mark Crowe1, Dr Justin Scott1, Dr Anne Bernard1

1QCIF, Brisbane, Australia


The QCIF training program has come about by a process of gradual evolution, with material being produced and added in response to the developing needs of our member organisations. Because of this, much of the training content has been developed on an ad hoc basis and not optimised for sustainability: it is typically developed by the individuals who will be delivering it, is lacking in documentation to support hand-off to other instructors, and consists of a combination of PDFs, Powerpoint presentations, markdown scripts and other mixed formats, making it hard to maintain.

QCIF has goals of achieving high growth in its training capacity, and to actively engage with the rest of the Australian eResearch training community. To support these goals, we are adapting our in-house training material to a format similar to that used by The Carpentries training community; that is, modular content, hosted on GitHub in markdown format, and containing a balance of theory, formative evaluation, and live coding training activities. This model also encourages community contribution and reuse, enabling a more consistent and higher-quality training experience.

In this poster we will review some of the challenges of ‘retrofitting’ training content to a sustainable format. We will also discuss the benefits that achieving this format provides towards maintaining and updating material and simplifying the process of inducting new trainers.


Mark Crowe is the Training Manager for QCIF, an organisation founded collaboratively by Queensland universities to provide HPC and data science research support. The QCIF training program provides short workshop-format training to research students and staff across the seven Queensland universities, reaching around 2000 trainees with over 100 workshops each year.

Prior to working with QCIF, Mark has been Training and Engagement Manager for QFAB Bioinformatics, Next-Generation Sequencing Manager for the Australian Genome Research Facility, and Operations Manager for the startup biotech company Catapult Genetics. He is also a Pom, the owner of a San Churro chocolate cafe, and the proud father of two boys.

Galaxy for Scientists – Advanced Hands-on Tutorials

Dr Gareth Price1, Mr Derek Benson2, Dr Tim Ho3

1Queensland Facility for Advanced Bioinformatics (QFAB), St Lucia, Australia, 2CSIRO, Pullenvale, Australia, 3CSIRO, Clayton, Australia


Galaxy is a popular web-based scientific analysis platform used by tens of thousands of scientists across the world to analyse large datasets from areas such as genomics and metagenomics.

This workshop will be of interest to Galaxy users who want to find out more about the use of Galaxy in areas such as:

– Genome annotation

– Metagenomics

– Sequence analysis

– Statistics and machine learning

We expect workshop attendees to have some knowledge of Galaxy and already have access to


Gareth Price has been a Genomics Scientist for over 15 years now. He has involved in experimental design, assay performance, data QC, data analysis and data interpretation from early printed microarrays, to cartridge based GeneChips through to multiple Next Gen platforms. These works have involved a variety of model organisms from microorganisms, fruit flies, mice to humans.


Gareth’s view is that research, clinical research, and healthcare are at their best when coupled with the most accurate, highest throughput and innovative technology and analysis. He uses this view to motivate the use of innovation to reduce the time between data generation and data summarisation, ready for the important phase of data interpretation and result in discovery.

Introduction to the Galaxy workflow tool

Dr Gareth Price1, Mr Derek Benson2, Dr Tim Ho3

1Queensland Facility for Advanced Bioinformatics (QFAB), St Lucia, Australia, 2CSIRO, Pullenvale, Australia, 3CSIRO, Clayton, Australia


This workshop will introduce the Galaxy platform for performing genetic and genomics analysis. This platform provides a workflow engine supported with a large number of tools covering common tasks such as DNA/RNA manipulation, mapping, filtering, ranking, annotation as well as phylogenetics and metagenomics.

This workshop will be of interest to anyone seeking practical strategies and tools that they can use to investigate their own genomic data. The workshop aims to demonstrate the power of linking tools into an analysis pipeline or workflow in Galaxy language. Based on a small number of hands-on tutorials, workshop attendees will construct, edit and chain workflows to demonstrate the reproducibility and reduced hands-on time that comes with managing analysis through Galaxy workflows.


Gareth Price has been a Genomics Scientist for over 15 years now. He has involved in experimental design, assay performance, data QC, data analysis and data interpretation from early printed microarrays, to cartridge based GeneChips through to multiple Next Gen platforms. These works have involved a variety of model organisms from microorganisms, fruit flies, mice to humans.

Gareth’s view is that research, clinical research, and healthcare are at their best when coupled with the most accurate, highest throughput and innovative technology and analysis. He uses this view to motivate the use of innovation to reduce the time between data generation and data summarisation, ready for the important phase of data interpretation and result in discovery.

Two auto-tuning methods for hybrid parallelization using OpenCL for processors with integrated graphics

Dr. Akiyoshi Wakatani1

1Konan University, Kobe, Japan


Performance tuning is to control several parameters of a system to accelerate the performance of the system. For numerical simulations using a large size computation, performance tuning is one of the most important techniques as well as high-speed computers. Some recent processors have GPU as well as plural processing cores of CPU, so both the parallelization of plural processing cores of CPU and the parallelization of GPU can be exploited simultaneously.

Since the optimal load balancing of CPU and GPU is hard to be determined, in the prior art, the authors proposed an on-the-fly auto-tuning method, which determines the optimal load balancing in runtime by using the pre-computation of fixed data size (AutoRatio).

However, in order to cope with the case that the value of AutoRatio is not determined in advance, we propose a new auto-tuning method that determines a load balancing in runtime by repeating small size pre-computations. By determining the optimal size of the pre-computation incrementally, this method does not require to determine the size of precomputation in advance. The effectiveness of our approach is empirically confirmed by using four applications.


Akiyoshi Wakatani received the B. Eng. degree from the Department of Applied Mathematics and Physics, Faculty of Engineering, Kyoto University, Kyoto, Japan, in 1984. He received the M. Eng. degree from the Division of Applied Systems Science, Faculty of Engineering, Kyoto University in 1986. He also received the Dr. Eng. degree from the Division of Information Engineering, Faculty of Engineering, Kyoto University in 1996. He was with Matsushita Electric Industrial (currently Panasonic) from 1986 to 2000, as a researcher. From 2000 to 2006, he was an Associate Professor of the Department of Information Science and Systems Engineering, Faculty of Science and Engineering, Konan University, Kobe Japan. Since 2006, he has been an Full Professor of the same university. His research interest includes parallel processing and programming education.

Building FAIR training material and networks: “FAIR literacy vectors” gaining momentum within the Research Data Alliance (RDA) community

Shelley Stall1, Romain David2,3,4, Laurence Mabile5,6, Anne-Sophie  Archambeau7,8,9, Sophie Aubin, Michele De Rosa11, Xavier Engels12, Yvan Le Bras14, Maggie Hellström13, Hana Pergl15, Erik Schultes15, Ben  Schaap16, Alison  Specht17, Sarah Stryeck18, Mogens Thomsen5,6, Silvia  Wissel19, Mohamed  Yahia20, Anne Cambon-Thomsen5,6

1American Geophysical Union, Washington, United States, 2U.M.R. MISTEA, INRAE, Montpellier, France, 3Montpellier SupAgro, Montpellier, France, 4Université de Montpellier, Montpellier, France, 5INSERM, Toulouse, France, 6Université Paul Sabatier Toulouse III, Toulouse, France, 7IRD, Paris, France, 8UMS PatriNat, Paris, France, 9GBIF France, Paris, France, 10DIST, INRAE, Versailles, France, 11BONSAI, Aalborg, Danemark, 12ANR, Boulogne Billancourt, France, 13ICOS Carbon Portal, Lund University, , Sweden, 14PNDB, MNHN, , France, 15GO FAIR International Support and Coordination Office, Leiden, The Netherlands, 16GODAN, WUR, Wageningen, Netherlands , 17SEES-TERN, the University of Queensland, St Lucia South, Australia, 18Graz University of Technology, Institute for Interactive Systems and Data Science , Graz, Austria, 19GO FAIR Initiative, ZBW – Leibniz Information Centre for Economics, Leibniz , Germany, 20INIST-CNRS,  France


Introduction: The optimisation of data reuse, the reproducibility of research and the openness of research results (if possible) are inseparable parts of research integrity. This has profound ethical roots that need to be part of FAIR literacy and training and emphasized in an international context.

Identifying the requirements for FAIR literacy in support of the emerging practices around the FAIRification of research data and services is a key step for open science development. By better defining the literacy of FAIR, it will be possible to bestow rewards and credits incentivizing FAIR skills such as accreditation of competence, awards for support-person recognition, for conference organisers, for trainers, diplomas for trainees and so on.

Method: This poster presents the method for identifying communication channels – ‘the vectors’ and associated rewards proposed by the RDA-SHARC and RDA-GOFAIR interest groups as:

(i) form of communication for the ‘message’,

(ii) attributes of the communication, E.g. tools,

(iii) acknowledgement that the message has been received and understood, with appropriate feedback.

Result: The choice of vector needs to be aligned with the preferences of the target audience receiving the message…and provide options for feedback to facilitate the adoption of FAIR practices. Each type of communication (letter, action sheet, MOOC, conference, practicals, continuous education, success stories, experience sharing …) and the method of sending it should be adaptable to different levels of skill sets and needs.

Conclusion: Optimally each type of communication used should be a community approved process in FAIRification in the short, medium and long term.


Shelley Stall is the Senior Director for the American Geophysical Union’s Data Leadership Program. She works with AGU’s members, their organizations, and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research data is managed and valued. Better data management results in better science. Shelley’s diverse experience working as a program and project manager, software architect, database architect, performance and optimization analyst, data product provider, and data integration architect for international communities, both nonprofit and commercial, provides her with a core capability to guide development of practical and sustainable data policies and practices ready for adoption and adapting by the broad research community.

Effective Research Software Verification

Mr David Benn1

1CSIRO, Adelaide, Australia


Research Software Engineers (RSEs) often work alone or in small teams, potentially on multiple concurrent projects and may be time poor. Both verification (building it correctly) and validation (building the right thing) are important. Limiting the focus to verification, what methods make sense for the varieties of application types in a research context? Concerns peculiar to research software and scientific computing in particular such as numerical tolerance, reproducibility, and the determination of parallelised and serial code equivalence are also important considerations.

Verification resources for programming languages including C++, Python and R are being collected. A repository for case studies and patterns derived from experience is being created for development activities such as porting and parallelising in conjunction with methods such as reference testing, TDD, and property-based testing. The emphasis here is on answering the question: what approaches are most effective for a given research software application type? A one day Python testing workshop was developed and delivered for the CSIRO Ag & Food data school and subsequently a software carpentry style Python testing episode with an emphasis upon test driven development.

All software requires verification and validation. Determining the appropriate approach to verification is crucial to the fitness, reliability and ongoing maintenance of research software. Organising a set of resources, training materials, and shared experience can only be of benefit to a community of software development practitioners and their beneficiaries.


David is a member of CSIRO IM&T’s Scientific Computing Research Software Engineering team, working with scientists to enhance and accelerate research through software development and high performance computing.

He is interested in the intersection of Science and software development, the publication of research data and software, approaches to verification, reproducibility, and programming paradigms.

In his spare time, David is an amateur astronomer with an interest in variable star observing.



AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd