[Bioperl-l] GenBank/GenPept DB/SeqIO: more problems

Hilmar Lapp hilmarl@yahoo.com
Sat, 24 Feb 2001 23:56:47 -0800

The record with accession AAC35035 (GenPept; the corresponding
nucleotide entry has acc AF074119) revealed even more problems:

1) The SOURCE and ORGANISM line had HTML tags if retrieved from
GenBank. These tags are added by the retrieval client (qmap.cgi)
and screwed up parsing of genus/species. Fixed by stripping off
HTML tags in those lines.

2) The specification
SOURCE      Calopogon tuberosus.
  ORGANISM  Chloroplast Calopogon tuberosus

screws the genus/species recognition, because an organelle was not
expected. Fixed for parsing as well as for printing (the organelle
now also gets printed).

This may also apply to the EMBL parser. (Haven't checked that yet,
I don't have a format reference at the moment.)

Hilmar Lapp                              email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                phone: +1 858 812 1757