Bioperl-guts: Possible Seq.pm reimplementation

Osborne, Brian brian.osborne@cadus.com
Mon, 2 Aug 1999 10:05:07 -0400


To the group,

I've been reading this group for some time, almost since it's inception.
It's only natural that Seq.pm gets a rewrite given the work that's been
done to add and separate functionality. By "natural" though, I don't mean
that the result is necessarily the best for all involved. Those involved are
the users. Who are the users? Do we know? I maintain that the project 
overall has evolved to favor those who create or maintain large or mult-
functional systems, but has lost sight, somewhat,  of the scripter,
beginner, 
and molecular biologist ( I get to say this since I _am_ a molecular
biologist  ;-). 

>From its origins as a single module we know have many. It seems to me 
that something like a standalone Seq::Simple would be useful, and I know
this approach
is seen occasionally throughout CPAN. Seq::Simple would  have
essentially the functionality of Chris' first release, including
translation. That's
because the act of translation almost always immediately follows the act of
"loading" the nucleotide sequence in the real world. Same for doing the
reverse-
complement, and so on.

This is the perennial developmental dialectic in OOP it seems. Being
"objectively" 
correct versus being easy. We've gotten into an analogous situation
frequently
with Oracle design : performance of a single table versus the more
extensible,
more abstracted, multi-table design. I'm wandering. You get my point. Most
probably
I'd write my own Seq::Simple for my own use, and fun, if I thought that
Bio::Seq
had gotten a bit bulky for the easy stuff. 

Congratulations for having conducted an excellent collaboration,

Brian O.

Cadus Pharmaceutical Corp.
777 Old Saw Mill River Rd.
Tarrytown NY 10591
brian.osborne@cadus.com
TEL 914 467 6291
FAX 914 345 3565



-----Original Message-----
From:	James Gilbert [mailto:jgrg@sanger.ac.uk]
Sent:	Monday, August 02, 1999 9:32 AM
To:	vsns-bcd-perl-guts@lists.uni-bielefeld.de
Subject:	Re: Bioperl-guts: Possible Seq.pm reimplementation



It seems unanimous that Bio::Seq needs a rewrite!

Here are some random thoughts on the subject:


I support renaming the old module to Bio::OldSeq


I'd like a name/id field to be kept.  Most
external programs bioperlers will use need a name
of some kind to mark up the results they return.  
This could be provided by the AnnSeq object, but I
think we should keep the core Seq object useable
in its own right.  I can't think of when I'd
really need the desc() method though.  Shouldn't
we take care of this with the Bio::AnnSeq::Comment
class?


The Bio::Seq object will be one of the first
modules that novice users look at, so we sholdn't
try to be too "clever" with it, but keep it a
simple and understandable as possible.  I don't
want to see it inheriting from a large number of
different classes, which the user has to hunt
through for methods and documentation.


I agree that we should remove the counter-
intuitive capitalization of the strings (Dna, Rna,
Amino), returned from the type() method.  I think
it would be nicer to have the different types of
sequences optionally blessable into sub-classes of
Bio::Seq, such as Bio::Seq::Amino (which would
then throw an exception if the revcom() method was
called).


I think we should have methods for checking the
integrity of the sequence string against standard
alphabets.  Should this be in the Bio::Seq object?
Maybe there should be a simple checking method for
checking that there aren't non-printable
characters in the string in Bio::Seq, which would
be replaced by checking for non-nucleotide
characters in Bio::Seq::DNA


I agree that the translate() method shouldn't be
in the core object.  I don't see the need for the
start() method if this is true.  The module which
implements translation should know about the
common translation tables.  There's a list at:

http://www3.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c

Inspired by Ian Korf's Eppendorf object, how about
a Ribosome object?


And a minor niggle.  Why revcom() and not
revcomp(), which I imagine most people would more
naturally type?  I guess we're stuck with
revcom().


James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919








=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================