Bioperl: article for Dr. Dobb's Journal

Steve Chervitz (Steve A. Chervitz)
Mon, 12 Oct 1998 13:47:16 -0700 (PDT)


Again, great job on the article. A few additional comments:

> DNA can do just two things: It can replicate, and it can be
> transcribed into RNA...

Perhaps this is just semantics but I would describe DNA as a
relatively inert molecule specialized for information storage. It
can't actually "do" anything but various things can interact with it
to read, write, and modify the information it contains. Extending your
computer program analogy, DNA is a storage medium for the data, as
well as the code that processes it. If you you don't want to go into
this, I would at least add "mutate" to your list of things that DNA
can do.

> Protein sequences use the letters
> A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,U,V,W, and Y. 

Remove U from this list. It represents selenocysteine which is not
part of the standard code. This would give you Gustavo's 20.

> Sequence::Base

I would recommend changing this to Sequence::Generic since it is
possible to confuse this with a nucleotide base (AGCTU). You refer to
it later on as generic sequence class. (Incidentally, one may want a
class to represent a nucleotide base, especially when working
with structure.)

> Unlike C++ or Java, Perl's object system allows for multiple
> inheritance, ...
C++ has multiple inheritence.

> The next method we define is seq(), which overrides the stub routine
> in the Sequence::Base superclass.  It allows the sequence to be both
> read and set. 
Word choice: I would say "It is an accessor function that allows the
sequence to be both get and set".

Now for some comments on Georg's comments:

Georg wrote:
 > I believe ref() returns the empty string for non-objects.


 > >>>>>>
 >      croak "Doesn't look like sequence data"
 >        unless $sequence=~/^[gactnu ]+$/i;
 > <<<<<<
 > At least in this case, I'd prefer carp() to croak(). I'm always very 
 > unhappy if some Perl module stops a long script although it could just
 > as well have proceeded along after issuing a warning. Perl itself 
 > usually tries to issue warnings instead of dying.

Here we go again! This brings up the issue of exception handling that
you don't mention in the article but will likely be of interest to
your typical DDJ reader. You might state that you can use Perl's
rudimentary built-in mechanism (which is evolving) or you can roll
your own.

(Aside: We've had this discussion in the bioperl mailing list
many times and one thing is clear: different people will have
different views about how to handle exceptions. Perl's TMTOWTDI(*)
philosophy allows for this. BTW, has anyone been playing with the
experimental feature in 5.005 of calling die() with a reference
value?  (*)"There's more than one way to do it.")

 > >>>>>>>
 > well as Perl classes for proteins, genes, genetic maps, phylogenetic
 > trees and 3D protein structures.
 > <<<<<<<
 > I've not yet released my phylogenetics code; only multiple alignment
 > is available.

The genes, proteins, and 3D structure code has also not been
officially released, but pre-alpha quality code can be accessed if you
dig deep enough. You could say that an effort to develop code for
these things is underway.

 > As an aside, I'm currently waiting for PDL 2.0 to come out
 > and stabilize; my plan is to use PDL for -- I hope to
 > be very fast in this way, without touching XS at all.

I think PDL would be worth a mention in the discussion about XS.

Steve A. Chervitz   
Neomorphic Software 
2612B 8th Street              +1 (510) 704-1030 Tel
Berkeley, CA 94710            +1 (510) 704-1013 Fax

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: