Genomics has recently been singled out as one of the largest data headaches we face. As we move to sequencing people multiple times, start newborn genome sequencing programs and increase our use of consumer genomics the amount of data goes up and up. Our GA1 generated 1Gb of data in about 11 days. Today our HiSeq 2500 puts out 1TB in 6.

We’re currently storing our data on disc for up to six months. After this we either delete it or archive it onto tape (although Ive no idea if we ever try to get it back off the tapes). A while back people used to talk about the storage being more expensive that a rerun, and I wonder if we are getting even closer to that point, especially if you try to grab the data off a tape in a secure storage facility.

I’ve always liked the idea of storing libraries and we have all 10,000 that we’ve run safely stored at -80C. These tubes take minimum space and most could be rerun today at a fraction of the cost from a few years ago. I am now wondering if we should go for an even greener solution and start the long term storage on Whatman cards (available from CamLab and others). A small number of cards could store almost everything we’ve ever run!

Is anyone doing this?