Long Read Sequencing (LRS) is a critical tool for understanding disease and variation in populations of samples ranging from human to critical food crops. LRS is particularly suited to the identification of structural variation due to the information gain within a single long read molecule. However, Oxford Nanopore Technologies sequencers do not have to sequence long reads. Adaptive Sampling using direct base calling, first demonstrated in our laboratory in Nottingham, enables dynamic selection of individual molecules during sequencing (Nature Biotech, 2020, Nature Methods, 2016). A sequenced read will be the entire length of the molecule, whereas a rejected read will become a short read. This results in a mixed library of molecules being sequenced from the sample. As a consequence, regions for which long reads are generated end up with higher coverage than unwanted regions and enable enrichment of, say, a set of cancer gene panels. We have previously shown that far more data can be obtained from this type of mixed experiment including capturing of structural variants of clinical importance and changes in copy number throughout genomes of interest. For example, binned read counts can be used to infer copy number variation regardless of read length, whilst long reads can be used to determine complex structural variation. A critical application for this approach is within medicine, where rapid identification of CNVs and SVs can be crucial to aid the diagnosis of a variety of tumour types and potential disease states. Currently these analyses take weeks or months.
Here we aim to reduce the total time taken by implementing real time analysis using our minoTour platform (see Munro BiorXiv 2021a). We have recently demonstrated integration of complex analysis pipelines within minoTour (see Munro BiorXiv 2021b) which allow us to dynamically update adaptive sampling targets (manuscript in prep). We will develop new adaptive algorithms to dynamically update selected regions for sequencing based on real time analysis of the likelihood of an SV/CNV within a given region of the genome (see De Maio BiorXiv 2020). Combined with our existing targeting strategies we anticipate being able to provide a single report to a medical colleague capturing SNP/SV and CNV data within 72 hours of sample receipt (see Patel BiorXiv 2021).
To test and develop this approach we will work closely with Mike Hubank (Scientific Director, NHS England North London Genomic Laboratory Hub). Alongside medical applications, this approach would have significant benefits to the study of SVs in non-human populations. We will therefore develop the methods to detect potential SVs/CNVs in a model plant population, Arabidopsis arenosa, with Prof Levi Yant. A. arenosa has undergone recent whole genome duplication and has evidence of significant SV within the population. Prof Yant has a large collection of samples amenable to sequencing, but cost and throughput could be dramatically reduced by the development of the pipelines proposed here. In addition, the smaller genome size of A. arenosa simplifies the development of pipelines as less data is required. This will allow us to exploit recent developments (manuscript in preparation) enabling adaptive sampling on barcoded samples and track SVs/CNVs in multiple samples on single sequencing flow cells in real time.
Home and international students are welcome to apply for this opportunity. Funding is available for four years from late September 2023. The award covers tuition fee (£4,596) at the home rate plus an annual stipend which was (£17,668) for 2022. This is set by the Research Councils. Please note that successful international candidates will be put forward for a University Fees Difference Scholarship to cover the difference between the home and international fee.
Apply online here by noon on Tuesday 17th January 2023