A great new resource was recently brought to my attention on Twitter and there is a paper describing it on the BioRxiv: DNAmod: the DNA modification database. Nearly all of the modified nucleotide sequencing we hear and read about is modifications to Cytosine mostly methyl cytosine and hydroxymethyl cytosine; you may also have heard about 8-oxoG if you are interested in FFPE analysis. All sorts of modified nucleotides occur in nature and may be important in biological processes where they can vary across tissue of an organism, or may just be chemical noise. The modifications are most important when they change the properties of the DNA strand, how is is read, and what might or might not bind to it e.g mC.

The biology of base modification is very complex – DNA methyltransferase marking Cytosine with a 5-methyl, TET family enzymes oxidising 5-methylcytosine to 5-hydroxymethylcytosine, and thymine DNA glycosylase-mediated base excision repair back to unmodified Cytosine. Many groups have worked on methods to sequence modified bases, with Shankar Balasubramanian’s research group here in Cambridge most closely associated with 5hmC-seq in his CEGX spinout.

DNAmod DB: The DNA modification database lists 38 modified bases, only 7 of which only been observed synthetically. It gives each a brief description of each modified base including the likely biological function, and most importantly for readers of Core Genomics it lists the methods that can be used to map the modifications in the genome.
Unfortunately it appears to miss the OxBS-seq method published by Booth et al in 2012, but does have the competing TAB-seq method published by Yu et al in the same year.

Not all bases are modified to the same extent: There are a total of 128 modified nucleotides reported in the unverified list on DNAmod. I’d assumed modifications would be about the same number for each of the biological building blocks but they vary quite significantly: Uracil has 45 mods (I’m guessing modifications in ribonucleotides need less careful control?), Adenine (39) has nearly twice as many modifications as Guanine (19), and Cytosine (13) and Thymine (12) have the least.
Citation: Sood AJ, Viner C, Hoffman MM. 2016. DNAmod: the DNA modification database. bioRxiv 071712.