[Bioperl-l] Problems parsing scientific name from a Genbank file
carze at som.umaryland.edu
Thu Jun 18 13:51:43 EDT 2009
I've searched through the mailing list and bug-tracker looking for any
indication of this (what I presume to be) bug I have been encountering when
parsing certain Genbank files using SeqIO::GenBank but have yet to find
anything. I apologize in advance if this is something that has already been
When parsing these files and extracting the scientific name it seems that
line breaks are causing the lineage info found in the ORGANISM section to be
captured as part of the scientific name. An example of this is accession
ORGANISM Bacillus anthracis str. Sterne
Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
Bacillus cereus has a line break which then causes scientific name to
capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
Not sure if anyone has ever ran into this problem but I would very much
appreciate any help or direction.
View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
More information about the Bioperl-l