Bioperl: Bio::Tools::Blast vs. Bio::GSC::Tool::Blast
Steve A. Chervitz
Mon, 8 Jun 1998 18:26:38 -0700 (PDT)
In general I agree with Ian's comparison. The following are some
Ian is correct in his assessment of my Blast modules having more depth and
less breadth than his GSC modules. I have tried to be more comprehensive
in my support of different versions of Blast (1, 2, WashU, gapped,
ungapped), whereas Ian's Blast module only parses WU-BLAST output
but he supports a family of different sequence markup tools (RepeatMasker,
Blast, Genscan, Dust, Seg, Xnu, etc.).
There are some interesting parallels: We both have a base class for the
Blast module (Bio::Tools::SeqAnal.pm in my package and Bio::GSC::Tool.pm
in Ian's), and we both have a sequence class. However, the underlying
approaches have significant differences.
One of the biggest differences is one of perspective. I would distinguish
these perspectives as "algorithm-based" (mine) or "sequence-based"
(Ian's). At the core of this difference is the sequence object. Ian's
sequence object incorporates the notion of a markup or annotation by a
Bio::GSC::Metric or a Bio::GSC::Tool containing any number of
Bio::GSC::Feature's. In this mindset, a Blast object always exists within
the context of a query sequence which it is marking up, so to speak. This
model is useful for processing sequences with a set of programs, where
the output of one program becomes the input to another.
My Blast object exists independently from its query sequence. It is
designed for generating and processing sets of Blast reports and
producing parsed output that covers a whole set of query sequences.
The sequence object I use, Bio::PreSeq.pm, does not incorporate the
idea of a markup. Its model is similar to Ian's Bio::GSC::Sequence.pm but
focuses more on support for different sequence file formats and common
Docs for Bio::PreSeq are available at:
(this version has been incorporated into my Bio:: framework.)
The ability to add arbitrary markups to sequences is a nice idea and I
think Ian is on the right track. In my Blast distribution, I have a demo
for generating a set of Blast reports given a set of sequences (see
eg/blast/blast_seq.pl), but I have no way of directly linking the Blast
report with the sequence object using Bio::PreSeq.pm.
Using a recursive Feature field as Ian does is a nice solution that could
be applied to things besides sequences (3D structures, for example). I
think it would be useful to have a more generic feature object from which
you could derive a SeqFeature object, that would include semantics such
as Ian has added. (One observation: to me it seems more natural to put
Feature at the top level of Sequence and have Tool be a field within
Feature, instead of Feature being a field within Tool as Ian has done.
Perhaps Ian could comment.)
This discussion seems to have zeroed in on the sequence object, which is
really central to the Bioperl effort. Coming up with a single, robust
sequence object that could be used by any Bioperl application is a noble
goal and worth more consideration. Consider this an action item for ISMB.
> Programming Style
> Steve passes named parameters in an anonymous hash, whereas I have a strict
> order of arguments. Is there any reason to prefer one over the other? Should
> all Bioperl modules have a consistent syntax? I make liberal use of constants.pm
> for parameters. This has the downside of polluting the namespace of an importing
> package, but it gives compile-time name-checking. Any thoughts on this?
Named parameters is a common Perl idiom and makes the code easier to
understand & maintain. It is not as efficient as ordered arguments and it
raises the oft-debated issue of how to name your parameters. I'm not sure
about the use of constants. It does seem like an extra layer of
complexity that may not be worth the compile-time checking bonus.
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc: