Letitia Sng, Natalie Twine, Aidan Tai, Laurence Wilson, Michael Kuiper, Cameron Hosking and Denis Bauer
Using cloud-based machine-learning GWAS platform VariantSpark to build cardiovascular disease risk models
Cardiovascular disease (CVD) is the leading cause of death globally, with one Australian having a heart attack every 10 minutes. CVD is a multifactorial disease that is influenced by lifestyle and genetic factors. Therefore, the successful prevention of CVD depends on the effective identification of an individual’s risk profile. Recent studies (Khera, et al. 2018, Inoye, et al. 2018) have demonstrated the superior performance of polygenic risk scores (PRSs) over traditional risk factors (e.g. BMI, smoking). However, as CVD is a polygenic disease, the interactions between genomic variants need to be captured to build PRS models that fully capture an individual’s genomic risk. Methods applied in existing PRS models use traditional approaches, such as logistic regression, which do not fully capture these interactions.
VariantSpark is a cloud-based machine-learning platform that has a demonstrated capacity to identify complex interactions between millions of genetic variants from thousands of samples efficiently. We have applied VariantSpark to detect genetic variants associated with CVD within the U.K BioBank dataset. Using these variants, we are building genetic risk models, predictive for CVD risk and measuring the performance of the resultant risk models in independent cardiovascular disease cohorts. Ultimately these risk models will incorporate cardiac imaging biomarkers to further predict clinical outcomes and risk factors.
|Natalie Twine||Lightning talk|
|Aidan Tay||Developing a computational safeguard to detect Gene Drives in the wild||Genome editing technologies such as CRISPR-Cas9, have made it possible to engineer gene drive systems for spreading desirable traits throughout wild populations. These systems rely on the fact that some genetic elements have a higher chance of being inherited, thereby allowing them to ‘drive’ through a population over many generations. By releasing genetically modified individuals containing gene drive systems and allowing them to breed with wild individuals, desirable traits for managing wild populations such as invasive species or disease vectors, can be propagated. However, the use of gene drive systems for managing wild populations remains hampered by the potential risks associated with releasing genetically modified individuals containing these systems. To help minimise the risks associated with releasing genetically modified individuals into the wild and improve the traceability of gene drive systems, we developed a computational approach for detecting the presence of gene drive systems within a genome. This is done by analysing the characteristic frequency of oligonucleotide sequences (i.e., genomic signature). Different organisms display unique genetic signatures, which can be used to differentiate DNA originating from different species. By analysing the changes in genomic content of sequences at different locations, the native DNA sequence and that of a gene drive can be distinguished. We demonstrate how gene drive systems can be detected in whole genome sequencing data derived from experimental sequencing library for yeast, and a theoretical sequencing library for the Cas9 gene. Importantly, this approach requires no prior knowledge about the genomic sequence and requires no alignment to a reference sequence, meaning it can be readily applied to poorly characterized organisms.|
|Laurence Wilson||Invited Talk||
COVID genomes – PathBeacon/StrEpiFun
The abundance of patient and pathogen genomic data provides a wealth of opportunities for tracking and modelling disease progression but brings with it new challenges in handling and processing the data. This is exemplified by the current COVID-19 pandemic. The global sequencing effort has provided researchers and clinicians with a wealth of genomic data, contributing to the development of vaccines and treatments as well as enabling genomic tracing of local outbreaks. However, with over 1.8 million viral sequences now available, traditional databases and platforms struggle to keep up. To address this, we have developed PathsBeacon, a cloud native platform that uses our sBeacon protocol to enable the fast and secure sharing of genomic data. Using this, researchers can rapidly search the almost 2 million sequences available to track mutations, identify similar strains and model the rate of spread of different variants. While originally developed to support COVID-19 research, PathsBeacon can be readily applied to variety of fields including Anti Microbial Resistance, Biosecurity and Agriculture.
|Michael Kuiper||Molecular modelling and Visualisation||The advent of cheap sequencing technologies and compute power has enabled new understandings in genomic research. The genomic sequences obtained however often represent proteins expressed by the organisms, which have 3-dimensional structure and biochemical functions. Molecular modelling and visualization help provide insights and interpretation of the biological function of these findings uncovered by genomic technologies, providing a closer representation of genomic digital data to its real-world counterpart.|
|Cameron Hosking||Detection of recombination amongst SARS-CoV-2 strains||Detection of recombination amongst SARS-CoV-2 strains The emergence and spread of the SARS-CoV-2 virus has been accompanied by a significant increase in the genetic diversity of the virus, with multiple phylogenetic clusters emerging. Contributing to this diversity is the potential for recombination events to occur between distinct viral strains. When two strains co-infect the same host, they may exchange genetic information during replication, creating new strains with characteristics of both parents. While a recombination event between zoonotic coronavirus strains has been postulated as the origin of the current pandemic virus, so far there has been little research into whether recombination is occurring among the human-specific SARS-CoV-2 strains currently circulating. We sought to undertake a thorough analysis of all viral genomes currently available (100,000+) to investigate whether the virus is undergoing active recombination. Examining existing phylogenetic trees, we find that many strains are poorly explained by just one parent. To examine whether these can be better explained be recombination events we developed an algorithm to find possible recombination events. We first used phylogenetic analysis to calculate the average mutation frequency distribution of the virus. Using this we simulated both mutational evolution of the virus as well as recombination, varying the frequency and extent of the events. Based on these simulated datasets, we developed and benchmarked an algorithm for detecting recombination between SARS-CoV-2 strains which we then applied to viral sequences downloaded from GISAID. Using this approach, we identified a number of recombination events amongst circulating strains, characterizing their distribution in terms of both time and geography. Based on our analysis, we believe that there is evidence SARS-CoV-2 is undergoing recombination and that this is contributing to the genetic diversity being observed.|
|Denis Bauer||Cloud-based bioinformatics|