Andrew Adey’s group at the Department of Molecular & Medical Genetics in Oregon Health & Science University published SCI-seq in Nature Methods earlier this week (it has been on the BioRxiv since July). SCI-Seq is a method for single-cell copy number analysis that is likely to be rapidly adopted. In the paper they describe the development of their combinatorial indexed sequencing for whole genome CNV applications.
Replication of single-cell experiments
In the abstract the authors highlight the high costs of current single-cell library-prep technologies. And while they say that this restricts the number of cells that users can afford to sequence, the paper does not mention the limitation in the number of samples that a user might process. Almost all single-cell work is unreplicated and this leads to horrible confounding effects. For single-cell methods to have real impact users are going to need to process many more samples. I think it is worth distinguishing between the number of cells sequenced and the number of samples processed, and that this should be an important consideration of readers and reviewers of single-cell papers. Millions of cells is very cool, but not if they come from one sample!
Get naked: Single-cell suspensions of frozen brain were created by douncing tissue and straining to remove larger clumps of cells. Nuclei were isolated with Nuclei Isolation Buffer from Roche. An excellent blog post on BiteSizeBio by Michele Ta describes nuclear isolation.
The most interesting part of this paper for me was in the first section of the results. The authors described the problems of getting naked DNA for general whole genome methods. In their earlier work on sciATAC-Seq the need to work on native chromatin meant DNA accessibility was not a problem (it was fundamentally part of the method). However for whole genome methods to work in single-cells DNA needs to be released from the nucleosomes, without disrupting the nucleus.
They did this by using lithium diiodosalicylate (LAND), which disrupts DNA–protein interactions, to release DNA from histones. Or by using xSDS; a combination of cross-linking followed by SDS treatment to denature histone proteins, the cross-linking was required to stabilise nuclear integrity.
Make libraries: To make single-cell copy number libraries the combinatorial indexed library preparation works by first sorting pools of several hundred or low thousands of cells into plates. 96 uniquely tagged tranpososmes add Nextera indexes to each DNA fragment. Nuclei are then repooled and sorted into a second plate for dual-index PCR amplification creating uniquely indexed libraries. The random assignment of transposome index, and dual-PCR index creates the high diversity combinatorial library.
96 transposon indexes x 96 dual-PCR indexes = 9,216 libraries
96 transposon indexes x 384 dual-PCR indexes = 3,6864 libraries
Comparison of sequencing data from LAND, xSDS or untreated nuclei for single-cell copy number showed that LAND or SDS treated nuclei had 5x less enrichment for ATAC signals (i.e. lower enrichment for open chromatin). The number of unique reads was also significantly higher in the treated nuclei. These comparisons demonstrate the free accessibility of the nuclear DNA.
The authors ran a mix of Human and Mouse cells to confirm the doublet rate in their experiments. The second sorting is where a unique cell index is really created and the doublet rate is controlled by varying the number of nuclei pooled at this stage on the flow cytometer. Empirical measurements showed 0-23% doublet rates and the group settled on sorting of 22 nuclei per well to get a 10% doublet rate. However the tests were pretty small scale and this kind of validation would need to be completed by users of the method if doublets would be a significant issue in their experiments.
The authors state in their discussion that the uniformity of nucleosome depletion by xSDS may be improved by the use of other cross-linking agents. But the resolution of around 250 kbp is likely to be.
CNV analysis with SCI-Seq: To determine single-cell copy number the genome was separated into dynamic windows to reduce the bias from repeats, centromeres, and “other complex regions”. A GC correction was applied to each of the individual windows based on its GC composition, with the assumption that larger windows need less GC correction. The ENCODE blacklisted regions were excluded from analysis. After this reads were normalised according to the average number of reads per window. CNV calling was performed with HMMcopy (Sorab Shah’s lab) or DNAcopy and Ginko. Finally tumour breakpoint analysis was performed.
I think SCI-Seq is a great method for single-cell copy number analysis. The paper highlights perhaps the biggest two challenges 1) getting naked DNA for library prep, and 2) reducing the bias in what DNA actually makes it into a library. This second bias is less of an issue with CNV as reads wil be normalised across windows. But as we move into single-cell genotyping experiments allelic drop-out is a massive problem…and one where unbiased library prep is likely to have a big impact.