Thinking about this made an idea pop into my head, not one I’m going to pursue but one I thought readers of this blog would like to hear about.
Welcome to Dr Evil’s Sequencing Centre:
Dr Evil sells Human and Mouse genomes, exomes and RNA-seq services at very competitive rates, almost too good to be true, but good enough that you’d think twice before sending your samples to a local facility. Dr Evil has a slick website and promises data delivery in record time, you can even have variants called or gene-expression matrices prepared free of charge. He’ll even take your order and you invoice immediately but you can send samples any time in the next 12 months; this is a great help moving some of that grant funding that will disappear on January 1st 2014! Off go your 24 Human samples for lung-cancer exome analysis and 5 weeks later a big disc lands with FastQ, Bam and VCF files. You can see many variants that seem to fit with your science, mutations in common lung cancer genes, and there are many 1000s more variants to consider. Now you sit down and spend six months analysing the data and trying to validate it in a larger cohort.
How did Dr Evil do your sequencing so quickly and so cheaply? Simple, he downloaded data from public repositories and faked it. A quick search find lots of genomes, exomes and RNA-seq data as well as sample metadata, reformatting that is pretty simple with a few scripts to rename the FastQ headers, perhaps add some common mutations from the PanCancer paper, COSMIC or somewhere else. The data looks real, and has results you might expect to see in a lung cancer exome experiment. And as it could be made up from a random selection of a larger subset of samples it won’t exactly replicate an already published paper. It could be months before you noticed something was up, and when you do Dr Evils disappeared with a small percentage of NIH or MRC funding. Your £12,000 24 exome project might just be worth it?
Perhaps buy a coffee for your local sequencing core-facility head next time you’re planning an experiment 😉
You could bury a "approximately 1% sequence error rate" claim deep, deep into the disclaimer fine print. Surely that would be a iron-clad defense against the lawyers. 🙂