[Bioperl-l] Bug in Contig.pm? How to compare two sequence objects?

Robson de Souza robfsouza at gmail.com
Sun Jul 25 10:42:35 EDT 2010


Hi Dan,

It is been a long time since I last loooked at this but, if I remember
correctly, the point is that Bio::

On Sun, Jul 25, 2010 at 9:23 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> The following bug report boils down to this question:
> How should two sequence objects be compared for identity? Does the
> object override 'eq' or implement an 'identical' method?

I think an 'identical' or 'equal' method would be the best alternative
since having a full method call would allow passing arguments like
'-mode => "complete"' to check all sequence features and annotations
if they exist and '-mode => "basic"' to check id() and seq() values.
Bio::Assembly::Contig depends mostly on the last one, although only
id() is tracked most of the time (because of the internal hashes).

> I found the following apparent bug in Contig.pm while executing the
> documented 'SYNOPSIS' code:
[snip]
> It seems to be a bug in the documented behaviour of set_seq_coord:
>        "If the sequence was previously added using add_seq, its
> coordinates are changed/set.  Otherwise, add_seq is called and the
> sequence is added to the contig."

In fact, it should not print warnings all the time....

> The offending line in that function seems to be:
>  if( ... &&
>      ($seq ne $self->{'_elem'}{$seqID}{'_seq'}) ) {
>          ... <spew warnings>
>  }
>  $self->add_seq($seq);
> which compares the *passed* sequence object to the sequence string for
> the *stored* sequence object of the same name. This comparison is
> always fails if I understood correctly, therefore set_seq_coord always
> spews warnings if called after add_seq.

Not the sequence string, but the objects themselves, i.e. the string
perl uses to represent Bio::LocatableSeq objects... it is a memory
based version of identical() :)

> Out of curiosity, how come I can't just say:
> my $ls = Bio::LocatableSeq->
>  new( -seq      => 'ACCG-T',
>       -id       => 'r1',
>       -alphabet => 'dna'
>       -start    => 3,
>       -end      => 8,
>       -strand   => 1
>     );
> $c->add_seq( $ls );

Oh, I don't remember but it was either a bad design decision I made 8
years ago to acommodate the Bio::Align::AlignI interface or a problem
with Bio::SeqFeature::Collection at that time. Whatever the case, it
would be nice to change it... you just need to create a
Bio::SeqFeature::Generic when
add_seq is called. I just won't have time to do it myself so feel free to act...

Best,
Robson

> I hope the above report can be of some use.
>
> Sincerely,
> Dan.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list