Mon, 4 Sep 2000 17:18:10 +0100
This describes exactly my situation in which I have to read in data in
all sorts of different formats (and people's interpretations of these
The problem now is that BioPerl throws a warning if a sequence does not
comply 100% with the standards and exits. While at that moment I want to
You mean it throws an exception. (Issuing a warning shouldn't cause an
be able to say that he can ignore the warning if (e.g.) he has read the
Does this sound like a call for a callback a client program can
provide? The question then is what should be passed to the callback
routine? The sequence object as it has been constructed so far? Sounds
fragile, and may be useless in many cases. The complete offending
source record? Would discard the parse done so far (for the callback),
and would require a partial rewrite of the parsers because they read
line-by-line (at least most if not all of the rich format parsers).
Something that would be really nice to have is a more modular approach
in which it would be easy to say: 'this data is in a format which is
EMBL, with the following quirks, additional fields, ... '.
Yes. But this needs a careful design of how can you split up the parse
of a sequence record into subtasks that are a) fairly independent (and
can thus be overridden by your QuirkyEMBL parser), and b) common to
all (rich) formats. Anyone's done any work in this direction so far?