Confession: this post may not actually make your 10X Genomics experiments ten times cheaper…but implement Demuxlet in your workflow and with the right sort of experiment you might just get there!
Single-cell RNA-Seq might just have got a whole lot cheaper, thanks to a method for identifying which individual a cell came from in a multiplex pool. If the methods, demonstrated on 10X Genomics, are broadly applicable to all scRNA-Seq technologies then users are likely to process more cells more often and run larger experiments than have been done so far.
In Multiplexing droplet-based single cell RNA-sequencing using natural genetic barcodes Jimmie Ye’s group at UCSF describe their Demuxlet method which should improve the ease of single-cell experimental design, reduce batch effects and reduce costs per cell for scRNA-Seq library prep.
Individual cells in a multiplexed 10X Genomics sc3’mRNA-seq prep are assigned to the samples in the pool based on their genotype. The method relies on a statistical model for predicting the probability of observing a genetic barcode for each cell. This barcode is generated from a set of SNPs detectable in the 3’mRNA-Seq reads. For a pool of 8 samples the authors present data showing that just 4 SNPs can assign cells to an individual and 20 SNPs distinguish samples with 98% probability. The same SNPs also mean that the majority of doublets are easier to remove since almost 90% of doublets will contain two cells, one each from two individuals, and so the discordant genotypes can identify doublets for removal from single-cell analysis (see Fig 1 above).
To test the methods 4 or 8 samples were mixed at equal cell loading (I’d like to know how they counted their cells) and 3500 or 4250 cells, for 4 sample mixes, or 6200 cells, for the 8 sample mix, were captured and sequenced. Demuxlet identified the expected ratio of individual cells in the different mixes. 91% of doublets were detected in a simulation and the expected umber of doublets were detected and removed from the test data (Fig 2C). Their analysis also shows how sample multiplexing may reduce batch effects which are a massive problem for the single-cell community (anyone reading this post should read Hicks et al).
As a biological proof of their method the authors performed an experiment looking at cell-type responses to IFN-b in PBMCs from Lupus patients. 1 million cells were isolated from each patient, pooled and divided into two aliquots (why no replication of this experiment) which were treated, or not with recombinant IFN-b for six hours. After stimulation almost 15,000 cells were captured and sequenced from each pool. Demuxlet identified >80% singlets and of these 995% were assigned to an individual. Doublets formed distinct clusters in the t-SNE plots (Fig s5) (this may be important in understanding what doublets look like and removing them even from cells with the same genotype). Because 8 samples are included in each pool they become biological replicates and increase the power to detect differences in what can be highly variable gene expression data from single cells.
The impact of Demuxlet on scRNA-Seq
Demuxlet looks to be a highly applicable method and one that should be relatively simple to implement in scRNA-Seq pipelines. Two main impacts are improving quality and reducing cost. Whilst many will jump to omplemnet this methods to bring down the extortionate costs of scRNA-Seq prep hopefully quality will be the biggest driver for adoption.
The Tweets on the BioRxiv page included an insightful one from @mdzieman on the use of TCR genotype – if enough information can be collected from the 3′ end?
Very clever work! I guess this would also work with TCR diversity information rather than standard genotypes? https://t.co/syq0IGSO84
— mdziemann (@mdziemann) March 22, 2017
Improving the quality of scRNA-Seq experiments
A side effect of using the Demuxlet method is that the ability to remove doublets increases as the level of multiplexing increases – it is less likely a cell will form a doublet with another cell from the same donor as the number of individuals in the pool increases. This allows higher numbers of cells to be processed and if 1000 cells is all that is required then the number of biological replicates can increase significantly.
Replication is fundamental in biological research so anything that makes users more likely to include replicates is great. We are likely to perform more robust experiments and also to be able to detect more subtle differences that impact biology. As costs per sample drop we’re also likely to run more samples meaning that drop-out when a prep fails is likely to have less impact on studies – right now this can be a major pain point in scRNA-Seq experiments.
Reducing the costs of scRNA-Seq experiments
By multiplexing several samples in a single-cell RNA-Seq prep the costs of sample preparation are shared between all samples in the pool. At just over £1000 per prep on the 10X Genomics system, and probably only a two-fold reduction in real costs for the cheapest scRNA-Seq methods available (Drop-Seq), the methods are expensive. However single-cell analysis is powerful and new methods are always expensive – until someone works out how to do them cheaper!
We’re starting to run larger 10X Genomics single-cell experiments, two are just being prepared for sequencing with 24 samples each. Each of these has cost around £30,000 to prep and while one could concievably make use of Demuxlet to reduce cots to more like £3600 (almost ten-fold cheaper), the other would not be possible due to the logistics of the samples. Although I suspect researchers will find ways to get single-cell suspensions to us in larger batches if they can save so much money.
The paper does not describe how the group genotyped their samples but using a large pool of 0.5 MAF coding SNPs should allow identification of individuals in a pool in most cases – if genotyping is required before pooling then this would still be a cost effective method, and for projects following up on samples already sequenced