Optimized Experimental Design for Translation Initiation using Machine Learning

Ms Mengyan Zhang1, Dr Maciej Holowko2, Dr Cheng Soon Ong1

1Data61, CSIRO and Department of Computer Science, Australian National University, Canberra, Australia, 2CSIRO Synthetic Biology Future Science Platform, Canberra, Australia



Synthetic Biology is on the verge of a leap into high-throughput data generation for which new methods of data handling and analysis will have to be developed. In this work, we show how machine learning can be used to analyse, predict the performance of the ribosome binding site (RBS) of E. coli – one of the main genetic elements controlling protein expression. We also show how to sequentially design the RBS sequence the find the optimal choice with high protein expression as fast as possible.


We build a Gaussian process regression model to predict the translation initiation rate (TIR) of each gene in terms of different RBS design. We formalize sequential experiment design as a multiarmed bandit problem.  All possible unique sequences of RBS form the decision set, and the algorithm recommends design choices for each round. The experimental validation uses synthetic biology, with a plasmid inserted into E. coli. We compare our experimental design with random selections, in terms of the cumulative regret caused by not choosing the optimal sequence.


We have analysed a number of datasets available from literature guiding our choice of algorithms and encoding methods. We discuss the generation and analysis of custom data produced in the CSIRO-UQ BioFoundry.


Machine learning is seeing increasing use in synthetic biology, where it guides more and more design decisions. In this instance we have shown how Gaussian process regression model can be used for prediction of TIR of an E. coli RBS.


Mengyan Zhang is a Ph.D. candidate at the Australian National University and Data61 under the supervision of Cheng Soon Ong, Lexing Xie and Eduardo Eyras. She received her Bachelor’s degree in ANU in 2018. Her research interests are online experiment design under uncertainty and multi-armed bandits.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.