Whilst these guidelines are a reasonable start and outline many of the issues RNA-Seq users need to consider, they fall a long way short of being truly useful to someone considering where to start with an RNA-Seq experiment.
Methods: RNA-Seq Methods mentioned include. transcript quantification, differential gene expression, discovery and splicing analysis. They don’t mention allele specific expression. Many types of input can be used in these methods, Total RNA (including miRNA of course), single cell RNA, smallRNA, polyA+ RNA, polysomal RNA, etc, etc, etc. The authors do state how immature RNA-Seq is and that the applications are evolving incredibly rapidly in almost every part of an experiment; sample prep, sequencing and analysis. They say they don’t aim to cover every possible application but instead focus on the major ones and also provide recommendations for providing meta-data, something too many scientists still don’t collect before and during an experiment, let alone submit with the data for analysis.
Metadata: recommendations include the usual suspects. For Cell lines; accession number, passage number, culture conditions, STR and Mycoplasma test results. For tissue the source and genotype if this is an animal, sample collection and processing methods, cellularity scores. And for the final RNA the method used for extraction and QC results (bioanalyser database anyone?)
Replication: They say that RNA-Seq experiments should be replicated (biological rather than technical) although ENCODE recommend a minimum of two replicates, which is very low. I defy anyone to find a statistician involved in microarray experiments that would settle for anything less than three and probably four replicates today. However they do give a get out clause for those who can’t replicate by stating “unless there is a compelling reason why this is impractical or wastefulâ€. An interesting point is that these guidelines suggest an RPKM correlation of at least 0.92 is required otherwise an experiment should be repeated or explained. I would have thought anyone publishing their experiments would already be explaining this and that reviewers would pick up on such poor correlations.
Read-depth: This is one of the hottest topics for RNA-Seq. It makes a massive difference to the final cost of the experiment and is a major determinant in the “microarrays vs sequencing†thought process. ENCODE suggest around 30M paired end reads for differential gene expression, however Illumina are suggesting you can use as few as 2M reads per sample today if you want the same sensitivity as Affy arrays. That’s a 15 fold difference and I suspect this will be revised in the next version of the guidelines. They do say that other methods will require more reads, up to 200M.
ENCODE aim to update this document annually, I am sure many will be encouraged by this as a useful endeavor. What about a step further with an open access Genomic journal that only covers annual reviews of methods, compares the variations and makes recommendations for a consensus protocol?
Leave A Comment