[Bioperl-l] genpept/swiss

Kris Boulez krbou@pgsgent.be
Mon, 4 Sep 2000 13:23:45 +0200

Quoting Hilmar Lapp (hlapp@gmx.net):
> Andrew Dalke wrote:
> > 
> > Hilmar Lapp <hlapp@gmx.net>:
> > >Some of you may object to this
> > 
> > I'm one of those objectors.  If the format isn't right in one place,
> > how certain are you that the recovery is correct and didn't skip
> > important information?
> > 
> I'm not at all certain, and that's why it is still reported as something
> you can turn into an exception programmatically. The point here is that
> the SeqIO parsers are not meant as format validators: if you've got your
> GenBank format writer wrong, taking the BioPerl parser as the judge for
> this is probably not the right way. The main objective is I think to be
> able to read sequence entries produced by someone you believe 'does it
> right'.
> Most people will not bother about possibly missing important information
> for one of 1e5 sequences because that one has a misformatted tag.  They
> just want to go through these sequences without having to add a single
> line of code to the parser, which is specific to their current release of
> whatever database, and will have to be re-tuned for the next one. 
This describes exactly my situation in which I have to read in data in
all sorts of different formats (and people's interpretations of these
The problem now is that BioPerl throws a warning if a sequence does not
comply 100% with the standards and exits. While at that moment I want to
be able to say that he can ignore the warning if (e.g.) he has read the
sequence correctly.

Something that would be really nice to have is a more modular approach
in which it would be easy to say:  'this data is in a format which is
EMBL, with the following quirks, additional fields, ... '.