De novo assembly of complex genomes is still harder than many would like.
For Cancer genomes de novo assembly could be the ultimate method for discovery of all somatic events. But the analysis requires long-reads or long-insert reads and this generally eats up lots of DNA in complex methods. The Broad spoke last years AGBT about the “perfect” mix of reads required for using All Paths to assemble a genome (300bp and 3kb I think); and Oxford Nanopore are promising us 100kb reads which would more or less solve the problem, although it may be some time before we can access the technology!
An alternative solution is to sequence a genome the old fashioned way, clone-by-clone, but with NGS. At least one consortium is attempting a BAC tiling-path of the Wheat genome, which is one of the most complex genomes out there. A limitation is the need to sequence some 150-200,000 BAC clones!!!
My “moleulo” idea: At the end of 2011 I was researching alternative digital PCR, and keeping up-to-date with genome-capture methods. While reading around these subjects I had the idea to mix the RainDance emulsion PCR with Illumina’s Nextera tagmentation.
Using a set of transposomes that insert unique barcode tags it would be possible to dilute large fragments of DNA such that a single 100kb fragment would be mixed with a single Nextera tag. The resultant transposition library prep would create a set of sequences that all came from the same genomic locus. Et voila a genome ready for two-step assembly; first a local de novo assembly would stitch together reads from single 100kb fragments, then the long-reads would be stitched together to create the entire genome. Amplifying the DNA first would help and both Moleculo and Population Genetics Technologies (Genome Pooling) have developed their methods to do this.
I had discussed using this on something like the Wheat genome with our Tech Transfer department but they thought it was not practical or protectable. For Wheat I’d ask RainDance to make a library of emulsion droplets from my 200k BAC clones (in the same way they make primer libraries), combine the amplified DNA with the multiplex Nextera droplets and in a single tagmentation get a pretty awesome Wheat genome. It would be possible to use something like Lucigen’s NxSeq fosmid prep kit to make a library for the human genome as well.
How does Moleculo work: I still don’t know the details, but expect to find out more at AGBT this year where there are several talks on the technology. It appears to be a combination miniaturisation of barcoded-genome library prep in microtitre plate, microfluidics or emulsions and standard NGS. How much DNA it requires as input and how confident you can be about the likelihood of two reads coming from a unique fragment will be the most significant issues for users.
The likelihood of generating reads from a single fragment is going to have something to do with the number of barcodes available and the number of individual reactions performed. The RainDance method I described would allow many 100’000s of tagment reactions to be made so even low-coverage sequence of each one should a robust long-read assembly. There’s a whole lot of maths and Poisson distribution statistics that need to be thought!