[Bioperl-l] Bio::SeqIO::genbank, Bio::Species - can't get full species name

Matthew Betts Matthew.Betts at ii.uib.no
Thu May 13 09:35:54 EDT 2004


I am trying to reconcile gene trees with species trees, and to do this I 
need the species names to be the same in both cases. The gene trees come 
from a clustering of GenBank coding sequences, and the species trees come 
from the NCBI taxonomy. However, when using BioPerl to extract the species 
info from GenBank entries, it only seems possible to get the first 
three words from the ORGANISM line, which are treated as genus, species, 
and subspecies in Bio::Species. However, in several cases, such as the 
example below, there is more information in the ORGANISM line. I suspect 
that this means that the subspecies name uses more than one word, or that 
the GenBank format is being broken? However, this is also how the names 
appear in the NCBI taxonomy names.dmp file.

The problem seems to be in Bio::SeqIO::genbank->_read_GenBank_Species(). 
There is a special condition there for viruses (the whole of the ORGANISM 
info is put on to the classification array), but the examples I have are 
for chordates (there may be others).

I'd be really grateful for any comments on the best thing for me to do.



LOCUS       AY211864                 701 bp    DNA     linear   ROD 25-AUG-2003
DEFINITION  Tamias amoenus X Tamias ruficaudus RBCM19680 cytochrome b (cytb)
            gene, partial cds; mitochondrial gene for mitochondrial product.
VERSION     AY211864.1  GI:33385214
SOURCE      mitochondrion Tamias amoenus X Tamias ruficaudus
  ORGANISM  Tamias amoenus X Tamias ruficaudus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Rodentia; Sciurognathi; Sciuridae; Sciurinae;
REFERENCE   1  (bases 1 to 701)
  AUTHORS   Good,J.M., Demboski,J.R., Nagorsen,D.W. and Sullivan,J.
  TITLE     Phylogeography and introgressive hybridization: chipmunks (genus
            Tamias) in the northern Rocky Mountains
  JOURNAL   Evolution 57 (8), 1900-1916 (2003)
REFERENCE   2  (bases 1 to 701)
  AUTHORS   Good,J.M., Demboski,J.R., Nagorsen,D.W. and Sullivan,J.
  TITLE     Direct Submission
  JOURNAL   Submitted (08-JAN-2003) Ecology and Evolutionary Biology,
            University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721, USA
FEATURES             Location/Qualifiers
     source          1..701
                     /organism="Tamias amoenus X Tamias ruficaudus"
                     /mol_type="genomic DNA"
                     /specimen_voucher="Royal British Columbia Museum
     gene            1..>701
     CDS             1..>701
                     /product="cytochrome b"
        1 atgacaaaca tccgcaaaac ccatcccctc attaaaatca ttaaccactc attcattgac
       61 ttacccgcac catccaacat ttctgcatga tgaaattttg gatccctctt aggtatttgc
      121 ctaattatcc aaattctcac tggactattc ctagcaatac actacacatc cgacacaatg
      181 acagctttct catctgtcac tcatatttgc cgagatgtaa actacggctg acttatccga
      241 tacatacacg ctaacggagc ctccatattt tttatctgcc tattccttca tgtaggccga
      301 ggactttact atggatcata tacctacttc gaaacatgaa acattggagt aattctttta
      361 ttcgccgtta tagccactgc atttataggt tacgttctcc catgaggaca gatatccttt
      421 tgaggtgcta ctgttattac aaatctccta tcagccatcc catatatcgg aacaacacta
      481 gtagaatgaa tctgaggagg cttctcagta gacaaagcca ctctaacacg attctttgca
      541 tttcatttta tcctcccatt cattattaca gcattagtta tagttcacct actcttcctt
      601 catgaaaccg gatccaataa tccttccgga ttaatctctg actctgataa aattccattc
      661 catccatatt acactattaa agatatccta ggcatcctcc t

More information about the Bioperl-l mailing list