This article was commissioned by Illumina Inc.

The most common NGS method we discuss in our weekly experimental design meeting is RNA-seq. Nearly all projects will use it at some point to delve deeply into hypothesis driven questions, or simply as a tool to go fishing for new biological insights. It is amazing how far a project can progress in just 30 minutes of discussion, methodology, replication, controls, analysis, and all sorts of bias get covered as we try to come up with an optimal design. However many users don’t have the luxury of in-house Bioinformatics and/or Genomics core facilities so they have to work out the right sort of experiment to do for themselves. Fortunately people have been hard at work creating resources that can really help and most recently Illumina released an RNA-seq “Buyer’s Guide” with lots of helpful information….including how to keep costs down.

Illumina’s “Buyer’s Guide”: the guide offers advice on common RNA-Sequencing methods and should help new users in evaluating the many options available for next-generation sequencing of RNA. Anyone considering a differential gene expression analysis experiment should have RNA-seq as their platform of choice and the guide presents three simple steps for users to consider different aspects of their experiments.

1) First of all make sure you understand what your scientific question is! This sounds simple but all too often people want to get too much out of one experiment and end up getting in a bit of a mess. Better to answer one question well, than two questions badly. Once you’ve thought about this it should be clear whether you want analyse mRNA’s for a simple differential gene expression experiment, or are after something else e.g. splicing, and also if you’ll  need to look at more than just poly-adenylated mRNAs. And if possible try to determine ahead of time whether the genes you’re interested in studying are highly expressed or very rare.
2) Once you’ve thought about this you can consider what sort of samples you have, are they low quality and/or low quantity? You should also consider who’s going to do the work in the lab and who’s going to analyse the sequence data?

3) Now you can really think about the final experimental design, what type f library preparation kit to use, replicate numbers, proper controls, depth of sequencing, etc. Illumina’s RNA-seq buyers guide describes some of the things you’ll need to consider in choosing the read-depth and run-type, and also include some tips for keeping the costs of your experiment down. 

What do people mean when they say “RNA-seq”: When people say “RNA-seq” most of them are talking about differential gene expression (DGE) by sequence analysis of reverse transcribed poly-adenylated mRNAs, but by changing the depth sequencing or type of sequencing, and/or choosing a different library prep kit you can investigate so much more. The guide includes three different scenarios for RNA-seq experiments including basic differential gene expression; DGE and allele-specic expression plus isoforms, SNVs and fusions; and finally whole transcriptome analysis. These show the breadth of experiments you can consider once you’ve mastered this method.

The first two scenarios showcase the power of RNA-seq and demonstrate how using a single library prep method, but varying the sequencing allows very different questions to be asked of your samples. The guide recommends Illumina’s TruSeq Stranded mRNA-seq kits (these are the ones we use most in my lab and we have done so ever since beta-testing the original RNA-seq kit many years ago). Scenario #1 is a simple DGE experiment and Illumina recommends you generate ≥ 10 million reads per sample, using single-end 50bp reads (SE50). Scenario #2 allows a full mRNA analysis by simply changing read depth to ≥ 25 million reads per sample, and using paired-end 75 bp reads (PE75).

If you are interested in more than poly-adenylated mRNA’s then changing the RNA-seq library prep kit to Illumina’s TruSeq Stranded Total RNA gets rid of ribosomal RNA’s, letting you anaylse both coding and non-coding RNA. Much greater read depth is needed and Illumina recommend ≥ 50 million PE75 reads per sample. Completing the RNA-seq line-up is the TruSeq small RNA kits which allow you to analyse microRNAs and other smaller transcripts, usually this requires only ≥ 1-2 million SE50 reads per sample.

How do Illumina’s recommendations stack-up: The guide is pretty good in the suggestions it makes for common RNAseq methods. I’d aim a bit higher for DGE and suggest ≥ 20 million reads per sample to allow profiling of high, medium and lowly expressed genes.  I’m really not keen on the suggestion that MiSeq or NextSeq mid-output are good tools for RNA-seq as from my experience most experiments, with sufficient replication, will be too large to fit into a single sequencing run. I’d argue that the cheapest way to get your RNA-seq data is going to be on HiSeq 4000, until of course we can run RNA-seq on X Ten. Of course not everyone should buy a HiSeq and a MiniSeq, MiSeq or NextSeq may be a good fit for your own laboratory; but I’d encourage you to consider the benefits of using your local core lab first though, especially if you are planning on doing experiments bigger than 12-24 samples. I’m not sure I’d argue quote as strongly for paired-end data and would prefer splicing, ASE, fusion detection to be coming from higher depth sequencing instead (50M SE50 reads cost about the same as 25M paired-75bp reads).

Why does my lab focus on mRNA-seq DGE: My own choices for RNA-seq are primarily informed by the questions people say that want to answer in experimental design discussions – and nearly all of these are differential gene expression questions. As such my lab runs lots and lots of Illumina’s stranded mRNA-seq kits. We only run some form of ribosomal reduction when the experiment warrants it as these methods generally require deeper sequencing for the same differential gene expression analysis power. We’ve very few users who need to run FFPE RNA so although we tested the RNA Access kit, we’ve yet to really use it in a significant project. This is partly because the research groups coming ot my lab understand the limitations of FFPE samples, and work hard to procure fresh frozen material wherever possible.

A brief bit about informatics: This article is focussed on the wetlab but without a good analysis pipeline you’ll be stuck with some big but unusable Fastq files. The analysis requirements are heavily influenced by the biological questions being asked,  by the samples available, and by the library preparation and sequencing performed. I’d always recommend the user to make sure they know what analysis is likely to be performed before generating data.

Many others have weighed in on how to use and design RNA-seq experiments (see the list of my favourite references at the bottom of this post). Nearly everyone agrees that replication is key with most people suggesting 4-6 biological replicates. Most papers agree on read-depth being kept to under 20M reads per sample. The ENCODE RNA-seq guidelines are very different recommending just two biological replicate and 30M paired-end reads per sample – I’ve never agreed with this, even when it was published in 2011, and have steered people to other resources. The Blogosphere also offers lots of help; a 2013 post by GKNO (Marth lab, U. Utah), and the RNA-seqlopedia (U. Oregon) are two great reads for people who want to know more.

All Illumina products listed are for research use only. Not for use in diagnostic procedures (except as specifically noted).

Further reading: