[Bioperl-l] Bio::LocatableSeq end checking inconsistency

Bernd Web bernd.web at gmail.com
Fri Aug 20 11:17:05 EDT 2010


Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
>    -id    => 'a',
>
>    -strand => 1,
>
>    -seq   => '-tcgatc-atcgatcg',
>
>    -start => 30,
>
>    -end   => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information      #convenient in coding but the
> start-end annotation is not right any more
>
> B. keep the information and forget the test   #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list