Professor Andreas Wicenec1, James Strauss1
1International Center For Radio Astromy Research, Crawley/Perth, Australia
Similar to other sciences astronomical data processing workflows are getting more and more complex. At the same time the amount of data and thus the required processing parallelism is becoming increasingly large and complex. Here we will present a system we have developed to support scientists, software engineers and HPC experts to work within their respective fields of expertise and collectively arrive in an integrated, executable workflow. Both the development as well as the workflow executions are managed and supported by an integrated GitHub versioning system. The complex partitioning and scheduling of these workflows has also been implemented and tested to generate execution graphs up to several tens of millions of tasks. We have executed workflows on many small and large scale systems, including Pawsey, Bracewell, Tianhe2, Summit (the complete system!) as well as on various cloud systems. Currently we are in the process of transitioning this system from the prototyping state to an operational system. We are also investigating automated ways to track and trace workflows from development to executions in order to seamlessly support repeatability and reproducability.
Andreas is leading the Data Intensive Astronomy Program at the International Centre for Radio Astronomy Research. He has many years of experience in designing and implementing systems to manage and process astronomical data from the largest optical and radio observatories in the world as well as satellites. His current role includes involvement in many projects including ASKAP and the SKA.