[Bioperl-l] Sequence of Blast hit

Jason Stajich jason at cgt.duhs.duke.edu
Mon Jun 7 09:34:17 EDT 2004

I'm a little confused of what you want at the end of the day.

For cDNA X you want the genomic locus of the homologous region in species
Y?  So the locus is defined as the minimum and maximum span of the cDNA on
some contig?  If you can generate a table that looks like
contig start end strand
by parsing a sequence alignment output (Bio::SearchIO)

Then you can get those subsequences with several tools depending on if you
are keeping the whole DNA local (Bio::DB::Fasta is the easiest here,
or you can use Bio::DB::Flat, Bio::Index::Fasta) or if contigs are in
public repositories Bio::DB::GenBank or Bio::DB::EMBL.  There has been
some talk of work on doing subsequence request with
Entrez/Bio::DB::GenBank but I don't know if it has been actually
incorporated or not.  I personally find it simpler and more reliable
(you always know which version you are working with, network problems
don't bite you, etc) to download the datasets and use the local indexes to
solve the problem.

A word of caution before using Bio::DB::Fasta make sure all the sequences
are consistently formatted in terms of line width - mixed width will cause
problesm.  The simplest thing is to use sreformat/bp_sreformat.PLS and
just re-export it as fasta.

On Mon, 7 Jun 2004, Jonathan Manning wrote:

> Hi All,
> I have been using BLAST to locate nucleotide sequences to the genome. I
> have then been using the Ensembl API to extract information based on the
> results. I'm doing something now where I really only need the matching
> genomic sequence, and from organisms other than those represented in
> Ensembl. The trouble is that when I BLAST with a cDNA, the resulting
> HSPs clearly only match exonic regions, so I need to get the sequence
> information from somewhere else. I don't want to retrieve the entire
> contig file.
> Is there an easy way to download a subsequence of contig/chromosome
> sequence without the whole file?
> Thanks,
> Jon
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
Duke University
jason at cgt.mc.duke.edu

More information about the Bioperl-l mailing list