I am not a Bioinformatician but I often want to do things Bioinformaticians say are easy. This is the first in a possible series of blogs about my experiences with Bioinformatics, text mining, etc. I’d be very happy to receive comments on the approaches I try or hints on how to do what I am trying to do more easily. For now I am not learning to program or run scripts from a command line. I am happy to try something in Galaxy.
Part of the reason for these posts is so I can remeber how to do this next time myself.
This is meant to be the non-Bioinformaticians way of doing things!
I used to just send an email to one of the Bioinformaticians at work but I couldn’t help but take the hint that they had work to do as well. So now I’ll usually try to find out if I can do what I want myself with publically available, usually web-based tools. I figure that even if I don’t succeed my trying is evidence I’m not just abusing their better experience. It also helps me better explain what I want so they don’t do some work for me and then I say how nice it is but it is not really what I thought I wanted after all.
UCSC Distributed Annotation Service (DAS):
The “DAS” server allows you to download data directly from UCSC. I suspect it is built to be queried from under the hood rather than through a URL. However it is an easy way to get the sequence from genomic coordinates so I like it.
There is a simple FAQ here. And a query for a sequnce looks like this http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr2:51409549,51409749 which will return 200bp of sequence from chr2 in Hg19 build of the Human genome.
Here’s how I used DAS and other tools to design some primers from a SNP:
Designing PCR primers from an dbSNP rsID: I am hoping to design some PCR assays to look at regions around SNPs. I got a list of SNPs from a colleague wth lots of information about them position, alleles, MAF, etc. However I wanted to take this list and design PCR primers to all 24 SNPs as quickly as possible.
Getting SNP locations: I was already sent the locations along with rsID’s but I wanted to see if I could find them myself. I like UCSC and it is usually my first port of call when looking for sequence information. In this case I simply entered an rsID into the UCSC position/search box (Image:UCSC RsID search) and then clicked through the link (Image: UCSC rsID results). This gives me the all important location information, chr1:4181020-4181020 in the case of rs10799216.
Getting flanking sequence: I used Excel (not a bioinformaticians favourite tool) to create a UCSC DAS address (Image: Excel to create DAS address) to pull out the flanking regions 100bp either side of the SNP (Image: UCSC DAS results).
Designing suitable primers: I used the sequence results from the DAS query in Primer3 (image: Primer3 Input) wth the parameters Targets=100,2 (primers surronding the 2 bases at position 100), ProdcutSizeRange=30-75 (I want short products) and NumberToReturn=50 (50 sets of primers). Hey presto PCR primers (Image: Primer3 Output)
Now I can get on and order some primers to test in the lab, where I feel much more in my conmfort zone.
UCSC rsID search |
UCSC rsID results |
Excel to create DAS address |
UCSC DAS results |
Primer3 Input |
Primer3 Output |
SNAP from the Broad |
SNAP query |
SNAP results |
about the snp data and using excel and word to generate a DAS list and get data fast even though I’m not a bioinformaticiain!!!
Leave A Comment