[Bioperl-l] Getting sequences by base pair locations

Cook, Malcolm MEC at stowers-institute.org
Tue Aug 1 11:12:08 EDT 2006


Glad to help.  Given that you are not running blat suite locally, but at
ucsc, you should try this approach:

upload/paste your blat results (in blat's native output format, psl) as
a custom track in the genome browser, named, say, myhumanhits
(i.e. just give the blat results a new first line like: `track
name="myhumanhits" description="myhumanhits from my favorite human
genes" visibility=2`)
then goto the table browser and configure it 
	group = 'custom tracks'
	track = 'myhumanhits'
	retion = genome
	output format = sequence
	output file = myhumanhits.fasta

submit it

When prompted, Save the myhumanhits.fasta to your computer and take it
from there.

I'm not sure how many hits this will work for, but i just did this on a
small track and it works just fine.  Only problem, the first word in the
fasta defline is always the same for all sequences.  You'll have to
'uniqify' these names somehow probably (depedning of course on your

Let us know & Good luck & ask for good email support on ucsc genome
browser subscribe to

Malcolm Cook
Database Applications Manager, Bioinformatics
Stowers Institute for Medical Research 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Yuval Itan
>Sent: Tuesday, August 01, 2006 8:36 AM
>To: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Getting sequences by base pair locations
>Thank you all for all the helpful answers!
>Malcolm- I've used the UCSC server to do the BLAT search (because I 
>couldn't run it locally due to memory problems)- so I could 
>not get the 
>chimp sequences in a convenient way. I have the results also in a 
>normal Blat output including all usual fields: chromosome number etc.
>Wade- thanks a lot for your offer, that would be great. The chimp 
>genome is just one large fasta format file.
>On 28 Jul 2006, at 14:30, Sean Davis wrote:
>> Yuval Itan wrote:
>>> Hello all,
>>> I was BLATing a few hundred human genes against the chimp 
>genome, and 
>>> kept the best chimp hits for every human gene.
>>> I have the base pair start and end location for every chimp 
>hit, and 
>>> I need to get the sequence for each of these chimp hits. Here is an 
>>> example for a few chimp hits bp locations:
>>> Start End*
>>> *142854 144504
>>> 154479 155198
>>> 153066 167370
>>> 163146 163559
>>> I have one chimp genome file (about 3GB) including all chromosomes, 
>>> but I could also get one file per chromosome if that would make 
>>> things easier. Does anyone have a script or a link for an interface 
>>> that can do the job?
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org

More information about the Bioperl-l mailing list