Bioperl-guts: Re: Bio::Seq feedback

Jeffrey Chang jchang@SMI.Stanford.EDU
Mon, 2 Aug 1999 23:18:20 -0700 (PDT)

As long as we're talking about this, I've always felt a little bit
uncomfortable with the sequence numbering works in  With the start
and end attributes of the Seq object, the class allows for biological
numbering systems that differ from a strict 1- or 0-based one.  In
addition, the end attribute doesn't even need to be equal to start +

This seems reasonable, but then, the getseq method takes $start and $end
parameters which are based on the $seq->{'start'} index.  This could lead
to correct code that looks incorrect:
$fragment = $seq->getseq(1, 215);

or incorrect code that looks correct:
$wholeseq = $seq->getseq($seq->start(), $seq->end());

If it's important to keep track of alternate numbering systems, perhaps we
should add functions that allow users to register arbitrary ones.  In
order to do this, I would prose replacing $seq->end with a method that
specifies a complete alternate numbering for the sequence and adding an
optional parameter to getseq that specifies which system to use. 

For example:
$seq = Bio::Seq->new(-seq=>'ILVM');
$seq->altnumber("PDB serial", ("4", "9", "16", "22"));
$seq->altnumber("PDB resnum", ("1", "2", "14", "15"));
$seq->getseq(1, 3);                     # returns 'ILV'
$seq->getseq("4", "16", "PDB serial");  # returns 'ILV'
$seq->getseq("1", "14", "PDB resnum");  # returns 'ILV'

This would make working with PDB files a little bit cleaner, because you
would be able to refer to residues by the actual residue number given in
the record (whose numbering is often quite messed up) rather than by
offsets from $seq->{'start'}.


=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: