1Rensselaer Polytechnic Institute
From the outside, contemporary analytics for science seems straightforward following a simple science workflow: start with a research question or hypothesis, get the data, do the analyses, interpret and publish. As more more complex science workflows are needed, and sophisticated analytic software packages and on-line services become available there is an even greater need to critically examine every aspect of the analytics workflow. If the science investigation is interdisciplinary, the data multi-modal or the research questions vague or ill-formed, then such undertakings are considered as socio-technical-systems. That is, there is a mixture of people with differing science and technical backgrounds, and the need for both in person and remote/ asynchronous collaboration. Also in the mix is the reality of available data; variable quality, completeness or documentation. Over the last decade our studies have become increasingly integrative, over discipline, location, time, and many other dimensions (quantities) of the data acquisition. To achieve useful integration for scientific discovery close attention is given to the relation among data structures of source datasets, but especially to those structures used in computing and analysis platforms in languages such as R and Python, and in technical environments. e.g. Jupyter notebooks. Mathematical representation as graphs, while effective for data integration, feature less support in the computing environments and open-source software packages we utilize.
The aforementioned considerations are presented and discussed in the context of three research projects involving many institutions and previously loosely connected disciplines. Accordingly, data varied from well curated sources to dark data rescue from published literature and other “grey” sources. Network analysis will be introduced in the context of the projects and key results are outlined. Along the way, the socio-technical aspects of the collaborations are discussed. Conclusions will be drawn in regard to broader applicability.
Peter Fox is Tetherless World Constellation Chair, Professor of Earth and Environmental Science, Computer Science and Cognitive Science, and Director of the Information Technology and Web Science Program at Rensselaer Polytechnic Institute. Fox has a B.Sc. (hons) and Ph.D. in Applied Mathematics (physics and computer science) from Monash University. Fox research includes computational and computer science; ocean and environmental informatics; and distributed semantic data frameworks, with applications to large-scale distributed data science investigations. Fox served as President of the Federation of Earth Science Information Partners (ESIP; 2014-2016), and as chair of the International Union of Geodesy and Geophysics Union Commission on Data and Information (2007-2015). Fox is Editor-in-Chief of AGU’s Earth and Space Science journal. In 2012, Fox was awarded the European Geoscience Union, Ian McHarg/ Earth and Space Science Informatics (ESSI) Medal, and ESIP’s Martha Maiden Lifetime Achievement award for service to the Earth Sciences Information communities. In 2015, Fox was elected as the first ESSI fellow for the American Geophysical Union, and AAAS fellow in 2018. http://tw.rpi.edu/web/person/PeterFox