Bio-geo analytics in the face of real data: sparse, heterogeneous, multi-dimensional – network analysis to the rescue?

Peter Fox1

1Rensselaer Polytechnic Institute


From the outside, contemporary analytics for science seems straightforward following a simple science workflow: start with a research question or hypothesis, get the data, do the analyses, interpret and publish. As more more complex science workflows are needed, and sophisticated analytic software packages and on-line services become available there is an even greater need to critically examine every aspect of the analytics workflow. If the science investigation is interdisciplinary, the data multi-modal or  the research questions vague or ill-formed, then such undertakings are considered as socio-technical-systems. That is, there is a mixture of people with differing science and technical backgrounds, and the need for both in person and remote/ asynchronous collaboration. Also in the mix is the reality of available data; variable quality, completeness or documentation. Over the last decade our studies have become increasingly integrative, over discipline, location, time, and many other dimensions (quantities) of the data acquisition. To achieve useful integration for scientific discovery close attention is given to the relation among data structures of source datasets, but especially to those structures used in computing and analysis platforms in languages such as R and Python, and in technical environments. e.g. Jupyter notebooks. Mathematical representation as graphs, while effective for data integration, feature less support in the computing environments and open-source software packages we utilize.

The aforementioned considerations are presented and discussed in the context of three research projects involving many institutions and previously loosely connected disciplines. Accordingly, data varied from well curated sources to dark data rescue from published literature and other “grey” sources. Network analysis will be introduced in the context of the projects and key results are outlined. Along the way, the socio-technical aspects of the collaborations are discussed.  Conclusions will be drawn in regard to broader applicability.




Peter Fox is Tetherless World Constellation Chair, Professor of Earth and Environmental Science, Computer Science and Cognitive Science, and Director of the Information Technology and Web Science Program at Rensselaer Polytechnic Institute. Fox has a B.Sc. (hons) and Ph.D. in Applied Mathematics (physics and computer science) from Monash University. Fox research includes computational and computer science; ocean and environmental informatics; and distributed semantic data frameworks, with applications to large-scale distributed data science investigations. Fox served as President of the Federation of Earth Science Information Partners (ESIP; 2014-2016), and as chair of the International Union of Geodesy and Geophysics Union Commission on Data and Information (2007-2015). Fox is Editor-in-Chief of AGU’s Earth and Space Science journal. In 2012, Fox was awarded the European Geoscience Union, Ian McHarg/ Earth and Space Science Informatics (ESSI) Medal, and ESIP’s Martha Maiden Lifetime Achievement award for service to the Earth Sciences Information communities. In 2015, Fox was elected as the first ESSI fellow for the American Geophysical Union, and AAAS fellow in 2018.

From Data to Insights: Shift toward Data Analytics

Thomas Huang1

1Jet Propulsion Laboratory, California Institute of Technology


JPL has a long history of building many innovative solutions for onboard instrument, ground operation and data system, archive and distribution for our missions. As the rate of data generate from our missions continue to increase and is expected to rise significantly in near future, JPL is engaging data science and artificial intelligence technologies and methodologies for mission operations and to enable science. In recent years, JPL made significant advancement to improve Earth science through machine learning, intelligent search, data fusion, interactive visualization and analytics. This talk presents some of the data science highlights as JPL’s ongoing effort in delivering operation-quality analytics solutions to mission operation and our science communities.




Thomas Huang is a Technical Group Supervisor for the JPL’s Computer Science for Data-Intensive Applications group. He is also the Strategic Lead for Interactive Analytics for the National Space Technology Applications Program Office, the Principal Investigator on several NASA Cloud-based big data analytic projects, and the System Architect for the NASA’s Sea Level Change Portal. As an expert in large-scale, distributed intelligent data systems, Thomas led planetary, earth data system, and defense research projects. Thomas was the Project Technologist for the NASA’s Physical Oceanography Distributed Active Archive Center (PO.DAAC).  As an advocate for free and open source software, Thomas led the open sourcing of many NASA-funded technologies. He recently established the Apache Science Data Analytics Platform (SDAP) as a community-driven, Cloud-based Analytic Center Framework. Thomas is a Computer Science lecturer at the California State Polytechnic University, Pomona, and a member of its Industry Advisory Board.

Machine Learning and Artificial Intelligence Future Science Platform (MLAI FSP, CSIRO)

Cheng Soon Ong1

1Data61, CSIRO

Scientific research is an iterative process alternating between a set of laws about the natural world (domain knowledge), and a set of measurements of the phenomenon (data). One key part of the process is the creativity of the research scientist. As computing, machine learning and artificial intelligence become more common, many parts of modern life are affected. The current scientific method (that goes back 400 years to Francis Bacon) is likely to change, and hence CSIRO is well placed to drive this through the MLAI FSP.

Instead of aiming at replacing the scientist, the MLAI FSP aims to augment the abilities of a scientist by using machine learning and artificial intelligence to improve both: extracting knowledge from data, and using domain knowledge to generate better data. This talk invites you to imagine what it means to embed computing into the scientific process. To imagine what it means to do research in the natural and social sciences, by taking advantage of the advances in data collection, computing infrastructure, and intelligent systems.


Cheng Soon Ong is a Principal Research Scientist at the Machine Learning Research Group. He is also an adjunct associate professor at the Australian National University, and an honorary research fellow at the University of Melbourne. Cheng Soon Ong completed his PhD in Computer Science at the Australian National University in 2005. He then was a postdoc at the Max Planck Institute of Biological Cybernetics and the Friedrich Miescher Laboratory in Tübingen, Germany. From 2008 to 2011, he was a lecturer in the Department of Computer Science at ETH Zürich, and in 2012 and 2013 worked in the Diagnostic Genomics Team at NICTA in Melbourne. Since 2014, Cheng Soon Ong is doing research with the Machine Learning Group in NICTA Canberra. Prior to his PhD, he researched and built search engine and Bahasa Malaysia technologies at Mimos Berhad, Malaysia. Cheng Soon Ong obtained his B.E. (Information Systems) and B.Sc. (Computer Science) from the University of Sydney, Australia.

Data-driven approaches to mineral exploration

Dave Cole1

1CSIRO, Docklands, Australia


The rate of new major mineral discoveries within Australia is decreasing as most deposits easily identifiable through existing techniques have already been found. At the same time the amount of available information relevant to mineral exploration is increasing. Machine learning techniques have potential to address both these problems by incorporating all available data and identifying complex patterns not easily discernible by traditional approaches. Such techniques can provide additional insight to geologists to compliment their existing knowledge and expertise and can help inform better decision making. This talk will briefly overview some examples of data-driven approaches applied to problems relating to mineral exploration.


Dave Cole is a software developer and research engineer with an interest in large scale systems of automation, data fusion, and predictive modelling. He has a Ph.D. from the Australian Centre for Field Robotics at the University of Sydney and over a decades experience developing solutions to complex problems of automation involving sensor data in outdoor environments. His work includes: developing control systems and sensor fusion algorithms for networked robotic systems; development of mine automation and visualisation software for controlling autonomous mining equipment, mine safety monitoring, asset optimisation, and real-time data integration; applying machine learning algorithms to various geoscience problems from small scale rock characterisation to large scale geological modelling. He currently works within CSIRO’s Data61 where his focus is on applying data-driven algorithms to problems within the mining and exploration industry.

Interoperable machine learning for Earth observation and climate in federated cyberinfrastructures

Tom Landry1

1Computer Research Institute of Montreal


Earth Observations (EO) enable scientific research, such as study of meteorology and climate, ecosystems and forests, hydrology and marine life. Machine Learning (ML) techniques are key to solve these complex multidisciplinary problems. ML comes with difficulties associated with managing and processing large datasets of heterogeneous data in increasingly distributed infrastructures. This is the case for EO data produced by Copernicus Sentinel missions, as well as CMIP6 climate data managed by the Earth System Grid Federation (ESGF).

Aside from the raw data, researchers also need access to annotated training data and services to tune, re-train, discover and run ML models. New formats such as Open Neural Network Exchange (ONNX) enables model sharing between different Deep Learning frameworks. Software containers can be used to package and deploy algorithms and frameworks into standardized services, applications and workflows. An example of this is European Space Agency (ESA) Thematic Exploitation Platform (TEP) open architecture, recently advanced standardisation and best practices using Open Geospatial Consortium (OGC) Web Services (OWS).


Tom possesses more than 20 years of experience in a variety of computer science fields, including E-Learning, geomatics, industrial automation, E-Commerce and sports science. His main interests are software architectures, project management, open innovation, remote sensing, big data and machine learning. As a product manager for geospatial platforms at CRIM, Tom leads several applied research projects joining Earth observation and climate. He is a member of CANARIE’s technical advisory committee and CRIM’s official liaison with the Open Geospatial Consortium (OGC). Since 2016, he has been involved in the Earth System Grid Federation (ESGF) as a member of its executive committee.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2019 Conference Design Pty Ltd