[Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch)

Heikki Lehvaslaiho heikki at sanbi.ac.za
Tue Sep 9 03:50:11 EDT 2008


George,

This is an error from my side. Great that you have a fix already.

My only worry is the number of external dependencies in BioPerl. To limit 
these we have recoded number of functionalities into BioPerl-specific modules. 
Before you commit the fix, could you see if Bio::RangeI could be used or easily 
extended to be used instead of Set::IntSpan?

Thanks,

   -Heikki

On Friday 05 September 2008 21:01:35 George Hartzell wrote:
> Hi all,
>
> Bio::Coordinate::Collection surprised me a bit.  At first I thought
> there was a bug, but it's clearly doing what it's supposed to.  Now
> I'm wondering if what it's supposed to be doing makes sense in some
> context, or if what I expected would be better functionality.
>
> t/CoordinateMapper.t sets up the following scenario:
>
> #
> # Collection
> #
> #         1   5     6   10
> #         |---|     |---|
> #-----|-----------------------
> #     1   5   9     15  19
> #         pair1     pair2
>
> Then goes on to do the following query:
>
>   # match more than two
>   $pos = Bio::Location::Simple->new (-start => 5, -end => 19 );
>   ok $res = $transcribe->map($pos);
>   is $res->each_gap, 2;
>   is $res->each_match, 2;
>
> I was surprised to see that there were two gaps, one gene:10-19 and
> one from gene:5-14.  Looking at the code, what's really happening is
> that, for the exon1 mapper there's match with gene:5-9 and a gap with
> gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a
> match with gene:15-19.  All four Result's just get tossed into the
> return value.
>
> The result my intuition wants is that there are two matches (gene:5-9
> with exon1 and gene:15-19 with exon2) and a gap (gene:10-14).
>
> Yes, I guess that I could just synthesize these myself from the result
> in my app.
>
> It still seems that the current result is a bug though, since there's
> no way of knowing when you're walking through $res->each_Location that
> the first "gap" is with respect to the exon1 mapper and that the
> second "gap" is with respect to the exon2 mapper.  The gaps are
> meaningless.
>
> I "fixed" it to work the way I think it should (two matches, one
> gap).  I actually extended the test case a bit so that there's a
> multi-base gap, a match, another multibase-gap, another match, then a
> single base gap (just to make sure I got that right...).  I had to
> touch up the test file a bit to account for my new test.
>
> The gaps that I return have a strand of 'undef', which seems to be The
> Right Thing.  There's also a bit of funny business where I hang onto
> the seq_id of the gapped sequence.  It assumes that the "in" sequence
> is the same for all of the mappers.  This seems safe since otherwise
> the entire query is kind of weird....
>
> There's a patch to todays svn head at:
>
>   http://shrimp.alerce.com/bioperl/collection-diffs.txt
>
> The patch changes Build.PL to include a dependency on Set::IntSpan,
> CoordinateMapper.t to update the tests, and
> Bio/CoordinateMapper/Collection.pm for the new code.
>
> Who's code would this break.
>
> If anyone's relying on the current behaviour re: gaps, what's the
> situation in which you find it useful?
>
> Thanks!
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list