Scalable Distributed Infrastructure for Data Intensive Science

David Abramson1

1University of Oxford



Modern research intensive organisations face challenges storing and preserving the increasing amounts of data generated by scientific instruments and high performance computers. Data must be delivered in a variety of modes depending on the end use, ranging from Web portals through to supercomputers. Building infrastructure to meet this need is complex and expensive. There is a need for mechanisms that support both managed and unmanaged data in a coherent and scalable way, often over a physically distributed multi-campus environment.

In this talk I will discuss the ways we are delivering such infrastructure at the University of Queensland. Long term hierarchical storage, and many of the computing systems, are housed in a commercial Tier 3 data centre 20 kms from the main campus in St Lucia. Some high performance machines and desktops, and all scientific instruments, are housed on campus. University researchers work with local, national and international collaborators, requiring the need to share data securely and efficiently across a variety of scales. Our COTS based “MeDiCI data fabric” provides seamless access to data in such an environment. In order to improve standards of management, curation and preservation of data, a locally developed meta-data management service called RDM provides a single point of access for storage requests. Recent work on the CAMERA environment links unmanaged collections to managed repositories in a flexible and efficient manner. Finally, the fabric delivers data to a range of commodity and novel computing platforms such as the FlashLite data intensive cluster and the Wiener GPU supercomputer.


David has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. Prior to joining UQ, he was the Director of the Monash e-Education Centre, Science Director of the Monash e-Research Centre, and a Professor of Computer Science in the Faculty of Information Technology at Monash. From 2007 to 2011 he was an Australian Research Council Professorial Fellow. David has expertise in High Performance Computing, distributed and parallel computing, computer architecture and software engineering. He has produced in excess of 200 research publications, and some of his work has also been integrated in commercial products. One of these, Nimrod, has been used widely in research and academia globally, and is also available as a commercial product, called EnFuzion, from Axceleon. His world-leading work in parallel debugging is sold and marketed by Cray Inc, one of the world’s leading supercomputing vendors, as a product called ccdb. David is a Fellow of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronic Engineers (IEEE), the Australian Academy of Technology and Engineering (ATSE), and the Australian Computer Society (ACS). He is currently a visiting Professor in the Oxford e-Research Centre at the University of Oxford.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.