Mr Christopher Watkins1, Dr Janet Newman1
1CSIRO, Clayton, Australia
The hullabaloo surrounding the recent successes of deep learning in image processing often falls short of the results obtained when applied to real datasets—that is, without significant development on the part of the machine learning practitioner. Functional accuracies and reliable predictions are possible, but not without a mindful approach to pipeline development.
Building machine learning pipelines that combine the various technologies available to today’s data scientist in a robust and repeatable manner is the core requirement when deploying automated image processing software. But with so many options, how can we ensure our data pipeline is accurate and that our deep tech is reliable?
This presentation will explore the development of a protein crystal image classification pipeline that has been autonomously deployed at CSIRO’s Collaborative Crystallisation Centre. Protein crystallization is at the heart of understanding a protein’s structure and function. As such, it is a core piece in the development of new drugs and vaccines as well as understanding the inner workings of greater biological systems. We will look at the trade-offs made when choosing machine learning models and the techniques used to discriminate between them and outline an approach to reduce the effects of overfitting.
Chris works mostly on machine learning applications in the scientific computing team within IMT.