A paper published one month ago in Nature Genetics: Stessman et al 2017, by Evan Eichler’s group at HHMI, reported finding 91 genes associated with neuro-developmental disorders. 25 of these were enriched in autism probands and the authors reported a “network associated with high-functioning autism (full-scale IQ >100)”. The paper presented smMIP sequence data from 11,730 patients across 208 risk genes. Their statistical analysis compared their sequence data against two control data sets, one of which was the ExAC database. Great stuff…

Yesterday Jeff Barrett (one of the ExAC authors (Lek, M.et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285291 (2016).)) was lead author on a BioRxiv “contradictory results” rebuttal of the statistical methods used. This describes “critical statistical flaws” and the BioRxiv analysis of the data does not support any of the novel findings from Stessman et al. Bummer!

My own interest in the original paper was the use of single-molecule molecular inversion probes (smMIP) for their targeted sequencing analysis (and I’ll get to that in a sec). My initial reaction is to ask how a paper published in such a high impact journal can have such obvious statistical flaws that they results are debunked less than one month after publication (I apologise if my interpretation is too harsh). However not being a statistician means my own ability to interpret the results from either paper is severely limited and I have to choose between the two papers somehow -but how?!

The Nature Genetics authors recontacted 125 patients for more in depth phenotypic follow up. If Barrett et al’s assessment of the statistics is correct then a question those patients may have is whether the follow up was unecessary?

Single-molecule MIP (smMIP)

smMIPs are a technology I’m interested in using for single-cell research so I’ve been recently reading up about them. Single-molecule MIPs were first described in 2013 by Hiatt et al (Shendure group): Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Genome Res.23, 843854 (2013). smMIPS were reported to have very low error rates at 8.4 × 10−6 in cell lines and 2.6 × 10−5 in clinical samples. The authors also reported as low as 0.2% allele detection in cancer genese e.g. BRAF, KRAS, etc.

They use the same principle as standard molecular inversion probes but the addition of a unique molecular index (UMI) allows highly sensitive detection of low-frequency or subclonal variants in a population – and I’m hoping to apply these to single cells to swamp the DNA with probes such that we have very high probability of complete allelic capture.

In the Nature Genetics study the authors designed 12,016 smMIPs to cover 208 genes, including 5bp intronic sequence to capture variation at splice-donor/acceptor sites. Around 4000 smMIPs were pooled (the authors describe their rebalancing of the smMIPs to achieve uniform sequence coverage but I could not find data). 192 patient samples were sequenced for each pool in a single HiSeq 2000 lane*. Data analysis was performed with the MIPgen and individual genotypes were called at a sequencing depth (DP) of >8×.

smMIPs have recently been applied in FFPE panel sequencing and in BRCA testing. The ability to target both strands independently, with low false positive rate and with low allele detection sensitivity means I think we’ll see a lot more of smMIPs. One of the papers below compared smMIP results to Sanger and Ampliseq PGM sequencing in their clinical diagnostic labs and showed it outperformed both methods.


* This is 333 lanes (16000 /192 *4) or 41 HiSeq 2000 flowcells = just 4 NovaSeq S4 flowcells if they had enough barcodes!