Three recent papers from Sam Apraicio (@sajraparicio) and Sorab Shah (@SohrabShah) are well worth a read on BioRxiv. In June Kieran Cambell posted a paper describing computational methods to integrate single-cell DNA and RNA-Seq data. And in September Camila de Souza and Emma Laks posted reports describing Epiclomal a new method to detect “epiclones” in tumour genomes, and generation of a single-cell shallow whole genome sequencing data set from 40,000 cells. All three are well worth a read.

The single-cell genomes were made possible using the direct library preparation (DLP) DNA transposition single-cell library preparation method (Zahn et al., 2017). In the BioRxiv report they describe how this has been modified to nanodispense into Takara SmartChips. This allows the Nextera library prep to be performed as an additive workflow resulting in many 100s of single cell whole genomes per day. They applied this to a range of tissues/cells including cell lines, human breast cancer PDXs and tumour samples. Importantly they address the economic/experimental tradeoff in scWGS: sequence a fewer cells for high genome coverage or high numbers of cells at low coverage. They also show how merging cells that cone from a single clone allows robust analysis of small-scale events. Other companies are developing single-cell DNA methods e.g. the 10X genomics single-cell CNV method, but the DLP+ offers perhaps a deeper coverage of the genome – ideally we’ll see back-to-back comparisons on BioRxiv soon!

clonealign: statistical integration of independent single-cell RNA & DNA-seq from human cancers

Measuring gene expression of genomically defined tumour clones at single cell resolution would associate functional consequences to somatic alterations, as a prelude to elucidating pathways driving cell population growth, resistance and relapse. In the absence of scalable methods to simultaneously assay DNA and RNA from the same single cell, independent sampling of cell populations for parallel measurement of single cell DNA and single cell RNA must be computationally mapped for genome-transcriptome association. Here we present clonealign, a robust statistical framework to assign gene expression states to cancer clones using single-cell RNA-seq and DNA-seq independently sampled from an heterogeneous cancer cell population. We apply clonealign to triple-negative breast cancer patient derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either DNA-Seq or RNA-Seq alone.

Epiclomal: probabilistic clustering of sparse single-cell DNA methylation data

We present Epiclomal, a probabilistic clustering method arising from a hierarchical mixture model to simultaneously cluster sparse single-cell DNA methylation data and infer their corresponding hidden methylation profiles. Using synthetic and published single-cell CpG datasets we show that Epiclomal outperforms non-probabilistic methods and is able to handle the inherent missing data feature which dominates single-cell CpG genome sequences. Using a recently published single-cell 5mCpG sequencing method (PBAL), we show that Epiclomal discovers sub-clonal patterns of methylation in aneuploid tumour genomes, thus defining epiclones. We show that epiclones may transcend copy number determined clonal lineages, thus opening this important form of clonal analysis in cancer.

Resource: Scalable whole genome sequencing of 40,000 single cells identifies stochastic aneuploidies, genome replication states and clonal repertoires

Essential features of cancer tissue cellular heterogeneity such as negatively selected genome topologies, sub-clonal mutation patterns and genome replication states can only effectively be studied by sequencing single-cell genomes at scale and high fidelity. Using an amplification-free single-cell genome sequencing approach implemented on commodity hardware (DLP+) coupled with a cloud-based computational platform, we define a resource of 40,000 single-cell genomes characterized by their genome states, across a wide range of tissue types and conditions. We show that shallow sequencing across thousands of genomes permits reconstruction of clonal genomes to single nucleotide resolution through aggregation analysis of cells sharing higher order genome structure. From large-scale population analysis over thousands of cells, we identify rare cells exhibiting mitotic mis-segregation of whole chromosomes. We observe that tissue derived scWGS libraries exhibit lower rates of whole chromosome anueploidy than cell lines, and loss of p53 results in a shift in event type, but not overall prevalence in breast epithelium. Finally, we demonstrate that the replication states of genomes can be identified, allowing the number and proportion of replicating cells, as well as the chromosomal pattern of replication to be unambiguously identified in single-cell genome sequencing experiments. The combined annotated resource and approach provide a re-implementable large scale platform for studying lineages and tissue heterogeneity.