[Bioperl-l] Categorization of EST's by species/taxonomy/lineage

Paulo Almeida paulo.david at netvisao.pt
Thu Apr 29 18:08:36 EDT 2004

Perhaps a little far fetched, but what if you write a script that goes like:

Read sequence from flat file.
If it's from a new species,
    Blast against mRNA database AND species database
    Check hits until you get one with the full lineage
End if
Sort it based on the lineage

I'm suggesting that only based on what you said, about mRNA records 
tending to have the full lineage, I have no idea if it would work. To 
blast against mRNA for the desired species you would add something like:

$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$species [ORGN] AND biomol_mrna [PROP]';

-Paulo Almeida

Mark Johnson wrote:

>     I've got a bunch of flat files containing EST sequences (GenBank
>format) from the NCBI ftp site.  I'd like to sort through them,
>categorize them, and build some blast databases.  It would be nice to
>be able to sort them into a few different piles, such as vertebrate,
>invertebrate, fungi, species1, species2, speciesN, etc.
>     To this end, having the full 'lineage' available would be handy. 
>However, EST records from the EST database only have the organism
>(unlike, say, mRNA records from the nucleotide database, which tend
>to have the full lineage (Eukaryota; Metazoa; Chordata; Craniata;
>Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini;
>Hominidae; Homo).
>     With mRNA records from the nucleotide database, this is an easy job,
>just call $seq->species->classification(), and sort through the list.
> However, with these EST files from dbEST, that doesn't work, the
>resulting list is empty.
>     I initially had high hopes after discovering Bio::DB::Taxonomy, but
>there are some bugs in the 1.4 version, and even upgrading to the
>latest in CVS, I can't seem to find a way to get the full lineage:
>#Bio::DB::Taxonomy (Well, really Bio::DB::Taxonomy::entrez)
>my $db = new Bio::DB::Taxonomy(-source => 'entrez');
>my $taxaid = $db->get_taxonid('Homo sapiens');
>my $taxobj = $db->get_Taxonomy_Node(-taxonid => $taxaid);
>#@classificiation contains 'sapiens' and 'homo'.
>my @classification = $taxobj->classification();
>Looking at the code for the classification method, I came accross this
>comment:  # okay this won't really work - need to do proper recursion
>So...is there a way to get to where I want to be without hacking on the
>module(s) in some terribly caveman like fashion?

More information about the Bioperl-l mailing list