[Bioperl-l] Bio/SeqIO/swiss.pm parsing error
er at xs4all.nl
Fri Nov 3 14:59:47 EST 2006
I noticed the parsing is borked with newest swisprot files:
UniProt Knowledgebase Release 9 consists of:
UniProtKB/Swiss-Prot Release 51.0 of 31-Oct-2006
UniProtKB/TrEMBL Release 34.0 of 31-Oct-2006
I edited my local copy of Bio/SeqIO/swiss.pm to parse the ID lines
in swissprot/trembl according to the new specification (see
Basically, the change is as follows:
ID EntryName DataClass; MoleculeType; SequenceLength.
is changed to:
ID EntryName DataClass; SequenceLength.
The change I made was only in the regex capturing the entry name:
method next_seq (Bio/SeqIO/swiss.pm) :
ID \s+ #
(\S+) \s+ # $1 entryname
([^\s;]+); \s+ # $2 DataClass
[0-9]+[ ]AA \. # Sequencelength (capture?)
$self->throw("swissprot stream with no ID. Not swissprot in my book");
I tested this (=entry parsable and SeqIO created) against several
hundred Swissprot and Trembl entries.
Of course, files with the older format are now broken - it may be better
to leave old and new format, and try both (newest first).
More information about the Bioperl-l