Ms Mengyan Zhang1, Dr Maciej Holowko2, Dr Cheng Soon Ong1
1Data61, CSIRO and Department of Computer Science, Australian National University, Canberra, Australia, 2CSIRO Synthetic Biology Future Science Platform, Canberra, Australia
Abstract:
Introduction
Synthetic Biology is on the verge of a leap into high-throughput data generation for which new methods of data handling and analysis will have to be developed. In this work, we show how machine learning can be used to analyse, predict the performance of the ribosome binding site (RBS) of E. coli – one of the main genetic elements controlling protein expression. We also show how to sequentially design the RBS sequence the find the optimal choice with high protein expression as fast as possible.
Methods
We build a Gaussian process regression model to predict the translation initiation rate (TIR) of each gene in terms of different RBS design. We formalize sequential experiment design as a multiarmed bandit problem. All possible unique sequences of RBS form the decision set, and the algorithm recommends design choices for each round. The experimental validation uses synthetic biology, with a plasmid inserted into E. coli. We compare our experimental design with random selections, in terms of the cumulative regret caused by not choosing the optimal sequence.
Results
We have analysed a number of datasets available from literature guiding our choice of algorithms and encoding methods. We discuss the generation and analysis of custom data produced in the CSIRO-UQ BioFoundry.
Summary
Machine learning is seeing increasing use in synthetic biology, where it guides more and more design decisions. In this instance we have shown how Gaussian process regression model can be used for prediction of TIR of an E. coli RBS.
Biography:
Mengyan Zhang is a Ph.D. candidate at the Australian National University and Data61 under the supervision of Cheng Soon Ong, Lexing Xie and Eduardo Eyras. She received her Bachelor’s degree in ANU in 2018. Her research interests are online experiment design under uncertainty and multi-armed bandits.