[Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names

Nadeem Faruque faruque at ebi.ac.uk
Mon May 15 15:47:27 EDT 2006


>> My personal view is that having it as an annotation would serve no  
>> real
>> purpose. For me the whole point of any kind of species  
>> representation in
>> bioperl is to allow you to compare species in a biologically  
>> meaningful
>> way. If it's just some annotation then that means it's basically

I understand the need to find the species name of entries, especially  
now that so many complete genomes have been given their own strain- 
specific tax nodes, and I also think it is a shame that the ncbi tax  
dump does not give a rank to entries such as these (they cannot  
easily be distinguished from unofficial ranks higher in the tree  
without ascending the tree).
Would it be useful for the species name to be included within EMBL  
file headers, eg in a line called OB (OB is a terrible suggestion  
based on 'Organism Binomial' since OS is already in use)?

eg two examples of the species 'Apple stem grooving virus', where the  
second one would appear to be a different species without delving  
into the tax tree or the inclusion of an OB line.

AC   D14995; S47260;
DE   Apple stem grooving virus genome, complete sequence.
OS   Apple stem grooving virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.

AC   AY646511;
DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
OS   Citrus tatter leaf virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.



> My point is, a large number of users do NOT use, nor care about,  
> taxonomic
> information to the degree they need to know the entire  
> classification of the
> organism; many are just as happy about getting the scientific name  
> only,
> which is in the GenBank/EMBL file itself.  To take one extreme, it  
> is not
> productive to force every user to download the NCBI tax database  
> and use
> lookups just to convert sequences from EMBL format to GenBank  
> format.  It's
> not productive to allow users to spam the NCBI tax database  
> remotely either,
> so hardcoding lookups is, IMHO, a big mistake.

I don't think you need to add any information to turn an embl-format  
file into a Genbank flatfile, but maybe I'm missing something obvious.

Nadeem


--
Dr S.M. Nadeem N. Faruque
9 Barley Court
Saffron Walden
Essex  CB11 3HG
01799 500 120



More information about the Bioperl-l mailing list