Bioperl: Bio::Tools::Blast vs. Bio::GSC::Tool::Blast

Steve A. Chervitz
Mon, 8 Jun 1998 18:26:38 -0700 (PDT)

In general I agree with Ian's comparison. The following are some 
additional comments.

Ian is correct in his assessment of my Blast modules having more depth and 
less breadth than his GSC modules. I have tried to be more comprehensive 
in my support of different versions of Blast (1, 2, WashU, gapped, 
ungapped), whereas Ian's Blast module only parses WU-BLAST output
but he supports a family of different sequence markup tools (RepeatMasker, 
Blast, Genscan, Dust, Seg, Xnu, etc.).

There are some interesting parallels: We both have a base class for the 
Blast module ( in my package and 
in Ian's), and we both have a sequence class. However, the underlying 
approaches have significant differences. 

One of the biggest differences is one of perspective. I would distinguish 
these perspectives as "algorithm-based" (mine) or "sequence-based" 
(Ian's). At the core of this difference is the sequence object. Ian's 
sequence object incorporates the notion of a markup or annotation by a 
Bio::GSC::Metric or a Bio::GSC::Tool containing any number of 
Bio::GSC::Feature's. In this mindset, a Blast object always exists within 
the context of a query sequence which it is marking up, so to speak. This 
model is useful for processing sequences with a set of programs, where 
the output of one program becomes the input to another.

My Blast object exists independently from its query sequence. It is 
designed for generating and processing sets of Blast reports and 
producing parsed output that covers a whole set of query sequences. 
The sequence object I use,, does not incorporate the 
idea of a markup. Its model is similar to Ian's but 
focuses more on support for different sequence file formats and common
sequence manipulations. 

Docs for Bio::PreSeq are available at:
(this version has been incorporated into my Bio:: framework.)

The ability to add arbitrary markups to sequences is a nice idea and I 
think Ian is on the right track. In my Blast distribution, I have a demo 
for generating a set of Blast reports given a set of sequences (see 
eg/blast/, but I have no way of directly linking the Blast 
report with the sequence object using 

Using a recursive Feature field as Ian does is a nice solution that could 
be applied to things besides sequences (3D structures, for example). I 
think it would be useful to have a more generic feature object from which 
you could derive a SeqFeature object, that would include semantics such 
as Ian has added. (One observation: to me it seems more natural to put 
Feature at the top level of Sequence and have Tool be a field within 
Feature, instead of Feature being a field within Tool as Ian has done. 
Perhaps Ian could comment.)

This discussion seems to have zeroed in on the sequence object, which is 
really central to the Bioperl effort. Coming up with a single, robust 
sequence object that could be used by any Bioperl application is a noble 
goal and worth more consideration. Consider this an action item for ISMB.

> Programming Style
> -----------------
> Steve passes named parameters in an anonymous hash, whereas I have a strict
> order of arguments. Is there any reason to prefer one over the other? Should
> all Bioperl modules have a consistent syntax? I make liberal use of
> for parameters. This has the downside of polluting the namespace of an importing
> package, but it gives compile-time name-checking. Any thoughts on this?

Named parameters is a common Perl idiom and makes the code easier to 
understand & maintain. It is not as efficient as ordered arguments and it 
raises the oft-debated issue of how to name your parameters. I'm not sure 
about the use of constants. It does seem like an extra layer of 
complexity that may not be worth the compile-time checking bonus. 

Steve Chervitz

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: