This is the first part of a 2 part blog. Part 1 “everything you wanted to know about NovaSeq” covers the technical aspects of NovaSeq, Illumina’s latest sequencer launched on Jan 9th 2017. The instrument is very clearly the next proper step in Illumina technology, which has proceeded through the previous platforms, Genome Analyser (GAI, GAII, GAIIx) and HiSeq (HiSeq 2000, 2500, 4000 and X-Ten). This post primarily covers some of the technical specifications of the new instrument. In particular the differences between the two instruments and the updated 2-colour SBS. In part 2: “Should you buy a NovaSeq for your core lab”, I’ll finish up by discussing some of the challenges the instrument presents for Core Labs and their users, and the costs of running NovaSeq compared to other sequencers.
At the end of these two posts I talk about whether you should purchase a NovaSeq or not. Although Mick’s right; Illumina effectively killed HiSeq 4000 just 24 months after launching the $600,000 instrument…something that will not have gone down well in labs that migrated from Hiseq 2500.
In these posts I’ll refer back to Mick’s table below copied from his blog: Opiniomics as well as the other slides from Illumina’s customer facing deck – all in the image gallery.
- NovaSeq 6000 $985,000
- Very high throughput for large, or pooled, sequencing projects
- 2-colour SBS may still dissuade some users, public data will be coming soon
- NovaSeq has the same index mis-assignment issues as HiSeq 4000
NovaSeq slide gallery
NovaSeq uses patterned flowcells and a reformulated NextSeq 2-colour SBS chemistry (more on that in a sec). The system uses a the now familiar cartridge system from NextSeq for reagents and a single sample is loaded and clustered on each flowcell. Two flowcells can be run independently. Importantly, when compared to HiSeq X-Ten, NovaSeq is capable of running dual-indexes (and this is likely to be hugely important in fixing the index mis-assignment issues on Illumina patterned flowcells).
The instrument is available in two configurations. However the likelihood is that most laboratories like the Genomics Core would want to access the full range of flowcell capacities for complete project flexibility. With a price tag of $135,000 lower than the NovaSeq 6000 the 5000 appears to be a very expensive smaller capacity machine. It is also very likely that the “upgrade” to 6000 is merely an increase in the scan area to accommodate the larger S3 and S4 flowcells. This makes the 6000 a highly profitable box, if Illumina are able to sell the 5000 at 70% margins then the 6000 is even better.
NovaSeq 5000: $850,000 available mid-2017 launched with S1 flowcells. Only compatible with S1 and S2 flowcells. Upgradeable to 6000 with a $200k upgrade path (ouch).
NovaSeq 6000: $985,000 available now launched with S2 flowcells. Compatible with S1, S2, S3 and S4 flowcells.
NovaSeq flowcells: Four flowcells will be available in different kit configurations with only S2 available at launch. Only S1 and S2 available with “short-read” kits, it is not clear if Illumina will release a short-read high cluster number S4 kit. This would allow the most cost-effective RNA-Seq, ChIP-Seq, etc. It may also be particularly useful for cancer RNA-seq where groups may want to look for expressed variants to determine clonality. And the ability to run 500 RNA-seq samples on one @illumina S4 NovaSeq flowcell would bring sequencing down to just $5-10 per sample .
NovaSeq flowcells are big! They are 8-10X larger than HiSeq, and around double the size of a NextSeq flowcell. About the same as an iPhone 7!
Novaseq Speed: NovaSeq is fast, but for core lab users the reduction in run time compared to HiSeq 4000 is unlikely to have a dramatic impact on total turnaround times for a typical RNA-Seq or Exome project. Larger projects will finish more quickly especially with an S2 PE150 run time of just 40 hours. Rather inexplicably the NovaSeq 5000 appears to be about 15% slower than the 6000 according to the customer facing slide deck Illumina are using (although I suspect this is simply the marketing team rushing to get slides together).
|HiSeq 4000||2,800 million||0.84||eBay|
|NovaSeq S1||1,600 million||0.5||mid-2017|
|NovaSeq S2||3,200 million||1||now|
|NovaSeq S3||6,600 million||2||end of 2017|
|NovaSeq S4||10,000 million||3||beginning 2018|
2-colour SBS chemistry:
NovaSeq implements a modified version of the NextSeq 2-colour SBS chemistry on patterned flowcells. The colours have been reconfigured so the T=Green, C=Red, A=Green/Red, G=no signal, and signal to noise ratios have been increased. The result is what looks like excellent sequencing chemistry with very low degradation in read 2 when compared to read 1. But many users have shied away from the 2-colour chemistry. I’ve spoken to people who are horrified at the idea of low frequency allele detection in cancer being done on a colour instrument. I have not compared NextSeq to HiSeq so can’t comment personally.
Mick pointed to a SEQanswer’s thread discussing NextSeq data quality and it did not look great with the V1 NextSeq chemistry, but V2 recieved much better feedback. However Mick also pointed to a comment on his blog by Brian Bushnell (who is one of the main contributors to the SEQanswers thread) and this hammered the 2-colour chemistry “The NextSeq typically has a 5-10x higher error rate. When V2 chemistry first arrived, there were a couple runs with fairly good accuracy. But recent runs are low quality again, with inflated quality scores, presumably to pass Illumina’s specifications (since those are based on quality scores rather than error rates). And the base-frequency divergence issue for read 2 was never addressed. The error profiles also seem substantially more biased than HiSeq, making low-frequency variant analysis very difficult.”
The release of NovaSeq data on BaseSpace should help and there is a NovaSeq WGS TruSeq PCR-Free 450bp insert (6plex) run on BaseSpace now. Whether it’ll will allow users to confirm the quality or not remains to be seen. However you won’t be able to confirm if any sample indexes have been swapped as although it is a pool, these are 6 replicates of NA12878.
Brian’s comment also highlights Illumina’s (over)reliance on “specifications” when troubleshooting their technology. These are designed to make data review simple – but it plainly is not when so many factors can affect the usability of NGS data. On the MiSeq the long-read 600 cycle kits have had an issue with poor quality at the end of the reads for almost 2 years. The runs often pass specifications as a high percentage of bases are above Q30. The first 100-200bp of an Illumina read can be excellent but the last 50bp can have terrible quality, this results in a high %Q30 and the run passes spec. But the user who is relying on reading the middle of their 600bp amplicon does not care about average Q30, just those bases with a rather inconvenient 5-10% error rate.
Coming in Part 2
In the next instalment I will focus on the challenges core labs face in adopting this new instrument. Whilst I am sure Illumina are hoping that NovaSeq will drive up instrument sales and save a hammering from the stock market, core labs may not adopt this instrument as fast as they hope.
The main issues I highlight are the single loading port to deliver one super-pool of libraries to the flowcell, and the impact of index mis-assignment, which I have previously written about. I’ll also consider the likely cost reductions of running small and large projects.
Ultimately core labs will need to balance capital outlay against cost savings, and some labs this will mean an inevitable wind down of their high-output sequencing. I suspect they’ll focus on the boutique projects that might never run reliably on NovaSeq (e.g. ATAC-seq) and the sample prep side of things – places tey can always add real value to their customers.