- Keiths OmicsOmics roundup.
Gordon’s good morning wake-up
Since shifting to series 9 flowcells an increase in yield not always available to customers, but ONT are working on making this achievable for everyone! No reason why it is not possible to reach theoreticla maximum of 1000bases per second (launched at 15bps, now at 450bps) – maybe dial the speed to match the hilohical questions.
Nanopore sequencing does not have the same limitations as PacBio. 100% accuracy is the aim, 99.99 is definitely achievable (and is simply not needed for many applications).
Squeezing out the MinION. Now focussing on prime time applications e.g. clinical. Flongle and MinION DX are going to change the face of diagnostics: sequence HIV not a 2kb region. Replacing PCR diagnostics with a nanopore test.
Plenary session 5
Nick Loman, (University of Birmingham): Stopping outbreaks becoming epidemics. The answer is “No” (find out what this means at the end of this talk). MinIOn has been everywhere in the last year,
“I would sequence, on a boat, with a goat, on a plane, on a train, donw some holes, at the poles, in outer space, at quite a pace.”
Ebola: 2015 London Calling presentations about Ebola sequencing in Guinea (Quick et al 2016). Loads of intersting epidemiological studies came out of, or followed on from, that work. Data sharing was evidently important in making this work – proper open access. Identify cross-border transmission, identify links between cases, identify flare-ups, demonstrate delayed transmission. See Nature 2017. But even with all this effort there was a significant delay in response to this outbreak “Outbreaks are inevitable, epidemics are not”
Zika: Said he’d sequence 750 genomes on MinION – on a bus (Faria pybus loman in Genome Medicine). But almost impossibly low viral levels make diagnostics “challenging”. Two main problems 1) lots of boring DNA e.g. host, E.coli, 2) MinION requires large numbers of molecules i.e. 100ng DNA. PCR, WGA, capture all help but add time and complexity – PCR probably still the easiest method. Josh spent a few months after Zika trip working on Primal Scheme for PCR tiling. Can now get near complete Zika genomes direct from clinical samples.
Yellow fever: Large outbreak in Brazil caused 200 deaths and fear of transmission if it gets into Aedes mosquito. Josh ran Primal Scheme design, ordered primers, sent to Brazil for processing by local team. 5-6 genomes per flowcell with 1D native barcoding kit, R9.4, 1-4 million reads per flowcell of 500bp amplicons. Software pipelines deployed in Docker for easy deployment.
Long-reads: Segway to long-reads via antibiotic resistance (see picture below). Unfortunately not quite got to 1Mb read just yet. Big improvement is moving from phenol:choloform extraction to agarose plug extraction and prep. The whole E.coli genome in 7 reads. Human genome data got to a 950kb read..we’re getting close to a finished genome on a flowcell.
Wishlist: local basecalling, no more file format changes (please), lower cost low throughput flowcells, ultra low-input prep. Can we please do some sequencing in the NHS this year or next!
- Niranjan Nagaranjan: exploring gut resistomes with nanopore sequencing. Antibiotics resistance is the “climate change” of the medical community. Large projects ongoing (GIS-TTSH collaboration, CaPES) what underlies protection against colonisation, evolution of resistome, strain dynamics, etc. Using a mock community and running patient samples to understand how to unravel strain genomes. Fragmented Illumina plasmifd assemblies resolved into two closed plasmids.
- Michael Boemo (University Oxford): Detection of base analogues with MinION. Studying replication dynamics with Thymidine analogues. A BrdU pulse followed by IdU generates base analogue regions for MinION detection to infer origins and termination sites for replication.
- Celine Bigot (CNRGH): Pathotrack project – identifying biological threats (air, water, etc) with a priori methods. Running Illumina Ion Torrent and MinION NGS. The control samples being tested includes 10 bacterial and 1 fungal species (still a work in progress). Action plan consists of testing and comparing 3 NGS technologies to assess reproducibility, sensitivity and specificity of methods. WIMP analysis shows it is possible to ID all species in control in just 5 hours.
- Scott Gigante (Walter and Eliza Hall Institute): in-house training of the nanonet local basecaller. Aiming to get methylation prediction from neural network, not HMM. Training is a black art – want to make predictions that help improve this process. Able to get 99% of CpGs called correctly.
- Ben Matern (Maastricht UMC): ID of human leukocyte antigen splice variants by MinION cDNA-Seq. Looking at HLA which presents antigens to immune cells, does funny things like alternative splicing! Using GMAP to detect exon skipping and other splicing anomalies from MinION data. Calculating expression profiles based on the exons detected, cluster and quantify, and understand variation in “normal” splicing of HLA. Aiming to test with direct RNA-Seq,
- Benjamin Istace (Genoscope): Large eukaryotic genome assembly using long-reads. Studying the banana genome. 100 million tons per year. Almost all edible species come from two ancestral genomes. Sequenced and assembled two accessions from long-reads via Pippin Prep size selected DNA. Got 22Gb of data with 18kb average read length = contig N50 of 1.85Mb.
- Matthew McCabe (teagasc): Sequencing of BRD viruses using rapid sequencing kit: BRD the leading cause of cattle mobidity and mortality in USA/Ireland, caused by DNA and RNA viruses, cause lung damage, current tests by viral qPCR. Want to make a test that can be run on the farm. Presenting data from 3 samples combined for prep and sequencing. Almost no detectable nucleic acid but made rapid library and ran on Nanopore – 7000 reads! Detected expected culprits – including nile crocodile pox virus!
- Libby Snell (ONT): Direct RNA-seq. Sequence RNA – “simples!”. Perhaps the most exciting development in RNA analysis (Here’s my post from late last year). Currently takes 2 hours, moving to rapid prep in just 30 minutes by ligating RNA protein motor on polyA tail, wash and load! Hoping that this will become the standard for RNA-Seq very soon. Showed data on ERCC, Lexogen SIRVs. Yeast sequencing looks good, some room for improvement, will be Voltrax compatible.
Breakout – Applications: RNA and cDNA
I posted earlier today about whether RNA-Seq is close to an inflection point and we’ll all swap to cDNA or direct RNA-seq very soon. Can we stop using the “direct” in direct RNA-Seq and simplyc all it RNA-Seq instead!!!
- John Tyson, Transcriptomic analysis via direct RNA sequencing and and targeted full-length cDNA analysis. Tested Rat cardiac model in direct RNA-Seq because long reads of RNA are important for splicing and base modifications. 18 year old mRNA direct RNA-seq vs Illumina (no chance to make a strand-switching library), 15,000 reads over 5kb, Titin reads over 2kb (just 85kb to go), 21,000 transcripts in common (basically all the higher abundance stuff) so lots of low copy transcripts missing but who’s really looking at these anyway! Phenomenal ability to look at splice isoforms with 3′ & 5′ ends. Poly-A tail analysis not completely resolved nor understood. Base modifications are interesting (how much of the RNA mod seen in@nanopore direct RNA-Seq is biochemical noise? ). Targeted RNA-seq: Required to get to low copy transcripts, PCR amplified from full-length cDNA. See a huge number of splice isoforms – 84 of 1284 possible transcripts identified in just 1200 reads. Analysis of +/-SSTR variants show co-regulation of other variants, now upping accuracy with 1D^2.
- Rachael Workman (Timp Lab), Comparison of direct RNA and cDNA sequencing of C. elegans. (For me this is one of the standout talks of the show for its possible impact on the direction my core lab) Lots fo advantage with direct RNA-Seq but cDNA-seq at 10+ million reads is likely to be transformative for RNA-Seq analysis. Using C.elegans as they can get huge amounts of RNA – 20-30ug of RNA into strand-switch protocol, prep then sequencing. 240k direct RNA-seq reads versus 2400k cDNA (10x more), 0.2 vs 3.23 Gb and 625 vs 1350bp average read length. Alignments slightly better in cDNA, accuracy near identical. Found a large proportion (~50%) of the curated gene transcripts for C.elegans.
- Chris Vollmers (UCSC), Identifying and quantifying transcript isoforms in single-cell nanopore RNA-seq data. A gold-rush (funded by Facebook) is happening. Populations are lots of individuals and individuals can matter (Gahndi versus Trump). Protocols based on SMART-Seq 2 sent off for Illumina RNA-Seq, while waiting ran Nanopore sequencing in the meantime with homebrew multiplexing. Added SIRVs from Lexogen as control because knowing the exact composition allowed them to develop and test analysis models. Started with 10fg of RNA! Got good results!!! Can identify and quantify isoforms with high accuracy (r=0.97) without genome annotation. Analysed a few B cells and focused on highly expressed genes: CD19 and CD20, found highly variable isoform expression that varies between cells. CD37 scary amount of splicing. See: BioRxiv paper.
- Andrew Smith, Direct nanopore sequencing of canonical and modified bases in 16S ribosomal RNA. Why sequence rRNA directly? No need for RT or amplification, get full-length reads, detect antibiotic resistance epigenetic marks (pretty good reasons). Trick was figuring out how to catch 16S rRNA (not using oligo-dT as in Direct RNA-Seq kit) so customised the kit to include a new split adapter complementary to the 3′ end of 16S rRNA. Works because this 3′ end is highly conserved. Extract, ligate adapter, sequence to get 100,000’s of 1.5kb reads. Expanded to work on other species and showed longer reads are much better able to classify species (600bp = 64% accuracy, 1000bp = 98%). slide shows coverage with a slight bump at ~500bp in resulting from a m7G base-miscall due to impact on ionic current – clearly showing base-modification detection. These 16S modifications can confer antibiotic resistance. Use direct 16S rRNA seq as a diagnostic; tested by spiking samples with lower copies and see nice linear response with results back in as little as 20 seconds! See: BioRxiv paper and my coverage of Andrew’s NYC talk.
- Panel discussion: poly-A tail length analysis (it would be great if Lexogen’s SIRV had different lengths), smallRNA and other non-coding no polyA tailed RNAs are important too.
Sorry but that”s it for me as I’ve got to go home. Enjoy the rest of the meeting.