[Bioperl-l] Bio/SeqIO/swiss.pm parsing error

Erik er at xs4all.nl
Fri Nov 3 14:59:47 EST 2006

Hi all,

I noticed the parsing is borked with newest swisprot files:
  UniProt Knowledgebase Release 9 consists of:
  UniProtKB/Swiss-Prot Release 51.0 of 31-Oct-2006
  UniProtKB/TrEMBL Release 34.0 of 31-Oct-2006

I edited my local copy of Bio/SeqIO/swiss.pm to parse the ID lines
in swissprot/trembl according to the new specification (see

Basically, the change is as follows:
  ID   EntryName DataClass; MoleculeType; SequenceLength.
is changed to:
  ID   EntryName DataClass; SequenceLength.

The change I made was only in the regex capturing the entry name:
method next_seq (Bio/SeqIO/swiss.pm) :


  unless(  m/
                  ID              \s+     #
                  (\S+)           \s+     #  $1  entryname
                  ([^\s;]+);      \s+     #  $2  DataClass
                  [0-9]+[ ]AA     \.      #      Sequencelength (capture?)
            /ox )
    $self->throw("swissprot stream with no ID. Not swissprot in my book");


I tested this (=entry parsable and SeqIO created) against several
hundred Swissprot and Trembl entries.

Of course, files with the older format are now broken - it may be better
to leave old and new format, and try both (newest first).



More information about the Bioperl-l mailing list