I built several TwitterBots last year to scrape papers on PubMed (thanks again to Casey), these have turned out to be really useful in alerting me to new work but last night I got a very interesting Tweet from @MattiasAine



I decided to take a look so downloaded both papers, Karlsson 2014 and Zhao 2015. Both describe experiments using Illumina 450K methylation arrays to investigate whether these can identify clinically relevant subgroups of lung cancer. Both report the detection of multiple subgroups, one neuroendocrine and four adenocarcinoma (epitypes ) that were associated with molecular features and patient outcome. Both suggest that methylation profiling could lead to better patient classification.

I did not download the raw data for these papers but I’d wonder if it would be possible to predict which country patients came from simply from the array data?

Even the most cursory look through these papers does seem to corroborate what @MattiasAine Tweeted; that Zhao 2015 is essentially a rip-off of Karlsson 2014. Both papers report on the same study setup, use the same analytical and computation methods and end up with the same findings. If Zhao 2015 turns out to be a replicated study it would be great. However there is no reference to Karlsson 2014 in the Chinese paper.
Patient data are almost identical, with the addition of 22 samples in Zhao 2015. If this is a reanalysis of the Karlsson 2014 data with additional samples then I guess there is some scope for it being published as a new article, but without referencing the earlier work and with the addition of just 22 samples I doubt this is the case.


Figure rotation: The figures in the Chinese paper are carbon copies of the Swedish paper. I looked as hard as I could at the first set of figures and can’t even see the extra 22 samples…



..coming soon to a Retraction Watch near you?