[Bioperl-l] Bio::DB::Taxonomy::entrez updated

Jason Stajich jason.stajich at duke.edu
Tue Aug 9 13:09:31 EDT 2005

I've updated Bio::DB::Taxonomy::entrez to now fully parse out the XML  
from the Efetch Eutils CGI script.  Can now return a fully populated  
Bio::Taxonomy::Node object, most importantly with a parent_id field  
filled in.  This allows the web-only implementation to work just as  
the flatfile implementation does and you can walk up the taxonomy  
hierarchy.  There is currently no way to walk down the hierarchy  
unless one can construct an Entrez query to get all the nodes which  
have a particular parent.  If someone knows how to do this, please  
let me know.

I added a few fields to Bio::Taxonomy::Node to capture genetic_code,  
pub_date, update_date, create_date, mitochondrial_genetic_code from  
the database entry.

At this point I think we can think about retiring Bio::Species and  
replace it with Bio::Taxonomy::Node.  I would probably just make  
Bio::Species delegate Bio::Taxonomy::Node or maybe someone can think  
of something more clever.  There will be a bit of fiddling under the  
hood to make this really work, but I think it can be done for the 1.6  
release and still be transparent to the user (i.e. API is completely  
retained for Bio::Seq->species, Bio::Species, etc however new  
functionality is now also available).

Here is how you can use the DB interface:

   use Bio::DB::Taxonomy;

   my $db = new Bio::DB::Taxonomy(-source => 'entrez');

   my $taxonid = $db->get_taxonid('Homo sapiens');
   my $node   = $db->get_Taxonomy_Node(-taxonid => $taxonid);
   print $node->binomial, "\n";

I added a script in scripts/taxa/query_entrez_taxa.PLS which  
demonstrates how to use it as well.

Where I find this modules useful is parsing a Search Result report  
and classifying hits by taxonomy.  Given a gi numbers in the search  
result (BLAST, FASTA, SSEARCH hits), getting the taxaid for the GI is  
just one step away now.
I added a capability to the API in Bio::DB::Taxonomy::entrez for  
retrieving taxonomy info based on a GI number.  You can pass in the - 
gi => $ginumber option to the get_Taxonomy_Node.

Demonstration of use here:

   my $gi = 71836523;
   my $node = $db->get_Taxonomy_Node(-gi => $gi, -db => 'protein');
   print $node->binomial, "\n";
   my ($species,$genus,$family) =  $node->classification;
   print "family is $family\n";

   # Can also go up 4 levels
   my $p = $node;
   for ( 1..4 ) {
     $p = $db->get_Taxonomy_Node(-taxonid => $p->parent_id);
   print $p->rank, " ", ($p->classification)[0], "\n";

   # could then classify a set of BLAST hits based on their GI numbers
   # into taxonomic categories.

I have tried to put these examples in the SYNOPSIS, t/Taxonomy.t and  
the script in scripts/taxa/query_entrez_taxa.PLS.  If there are  
mistakes or typos, or something is unclear, please let us know and it  
can updated.    I hope a section describing how to use these in  
SearchIO context (parsing reports) can be added when I have time.

Jason Stajich
jason.stajich at duke.edu

More information about the Bioperl-l mailing list