Python-based visual exploration of enriched motifs from panning experiments

Dr Yi Jin Liew1, Dr Jason Ross1, Dr Kathy Surinya2, Dr Maxime Francois2, Dr Simon Puttick3, Dr Stephen Rose3

1CSIRO, North Ryde, Australia, 2CSIRO, Adelaide, Australia, 3CSIRO, Herston, Australia



Some tumours have proteins in their cell membranes that are absent in healthy cells. In developing potential treatments, we want to identify novel peptide binders to these proteins. To do so, phage display libraries are panned against these proteins over several rounds–retained sequences would be enriched for real binders.

Our team consists of biologists with extensive panning experience, chemists with protein know-how, and bioinformaticians with experience in analysing next generation sequencing data. To ease data exploration by all parties, the bioinformaticists built an interactive dashboard in Bokeh to visualise changes in motif frequencies across any two phage display experiments. Ultimately, these plots help separate biased motifs (which is enriched due to technical factors) from true binders (enriched due to binding to the protein of interest).


We coupled a k-mer counting strategy with a custom distance matrix to cluster similar peptide sequences in the main plot. Point sizes of each peptide were proportional to the frequencies so that abundant peptides stood out in the plot. Based on iterative feedback from the team, we devised four colour schemes that emphasises different aspects of the data. One key feature of Bokeh–its lasso-selection tool–was enhanced so that selected regions had tabulated detailed information, and a sequence logo to spot general motifs. The extensible nature of the dashboard allows for the further inclusion of other informative subplots.


Based on the observations made on the plot and further calculations, we chose a few promising candidates for further testing.


Yi Jin is currently a Research Scientist in the Molecular Diagnostics Solutions group in CSIRO, attempting to squeeze public datasets for promising cancer biomarkers. He thinks that well-visualised data speaks ten thousand words.

Prior to that, he was a postdoc at the King Abdullah University of Science and Technology in Saudi Arabia, where he studied DNA methylation in corals (and regrets never mastering diving despite working on corals AND living by the Red Sea). He graduated with a PhD in Genetics from the University of Cambridge, but has, over the years, swapped the pipette for a keyboard.


AeRO is the industry association focused on eResearch in Australasia. We play a critical coordination role for our members, who are actively transforming research via Information Technology. Organisations join AeRO to advance their own capabilities and services, to collaborate and to network with peers. AeRO believes researchers and the sector significantly benefit from greater communication, coordination and sharing among the increasingly different and evolving service providers.