Bioperl: Bug in EMBL parser: expects FH/FT lines

Peter van Heusden
Thu, 15 Jun 2000 10:10:02 +0200 (SAST)

Hi All

There is a bug in the way the Bio::SeqIO::embl module parses EMBL, and the
way it writes EMBL entries.

As I read the EMBL manual, 3.3 Structure of an Entry
there is a defined order of particular lines, but there is no requirement
that all types of lines are present. In particular, there may be 
0 or more FH/FT lines. 

Unfortunately, the next_seq() function explicitely expects ID,
then various optional things, then FH, then FT, then SQ. This means that
an EMBL entry without FT lines is not only ignored, but causes
next_seq() to read to the end of the file, discarding all lines along the
way. A rather major bug, in my opinion.

Secondly, and the reason I discovered this, the
write_seq() function always writes the FH lines, and only writes FT lines
if there are features present. Surely it should check to see if there are
features before writing FH lines?

In some of our work here, we converted a set of Fasta entries to EMBL,
using Bio::SeqIO, and then (some time later) tried to convert the
resulting EMBL entries to Fasta, at which point the above behaviour was 

If everyone agrees with my understanding of how things should work, I
can submit patches...

Peter van Heusden
Electric Genetics

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: