[Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch)
hartzell at alerce.com
Thu Sep 25 01:17:54 EDT 2008
Heikki Lehvaslaiho writes:
> This is an error from my side. Great that you have a fix already.
> My only worry is the number of external dependencies in BioPerl. To limit
> these we have recoded number of functionalities into BioPerl-specific modules.
> Before you commit the fix, could you see if Bio::RangeI could be used or easily
> extended to be used instead of Set::IntSpan?
> On Friday 05 September 2008 21:01:35 George Hartzell wrote:
> > Hi all,
> > Bio::Coordinate::Collection surprised me a bit. At first I thought
> > there was a bug, but it's clearly doing what it's supposed to. Now
> > I'm wondering if what it's supposed to be doing makes sense in some
> > context, or if what I expected would be better functionality.
> > t/CoordinateMapper.t sets up the following scenario:
> > #
> > # Collection
> > #
> > # 1 5 6 10
> > # |---| |---|
> > #-----|-----------------------
> > # 1 5 9 15 19
> > # pair1 pair2
> > Then goes on to do the following query:
> > # match more than two
> > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 );
> > ok $res = $transcribe->map($pos);
> > is $res->each_gap, 2;
> > is $res->each_match, 2;
> > I was surprised to see that there were two gaps, one gene:10-19 and
> > one from gene:5-14. Looking at the code, what's really happening is
> > that, for the exon1 mapper there's match with gene:5-9 and a gap with
> > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a
> > match with gene:15-19. All four Result's just get tossed into the
> > return value.
> > The result my intuition wants is that there are two matches (gene:5-9
> > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14).
> > Yes, I guess that I could just synthesize these myself from the result
> > in my app.
> > It still seems that the current result is a bug though, since there's
> > no way of knowing when you're walking through $res->each_Location that
> > the first "gap" is with respect to the exon1 mapper and that the
> > second "gap" is with respect to the exon2 mapper. The gaps are
> > meaningless.
> > I "fixed" it to work the way I think it should (two matches, one
> > gap). I actually extended the test case a bit so that there's a
> > multi-base gap, a match, another multibase-gap, another match, then a
> > single base gap (just to make sure I got that right...). I had to
> > touch up the test file a bit to account for my new test.
> > The gaps that I return have a strand of 'undef', which seems to be The
> > Right Thing. There's also a bit of funny business where I hang onto
> > the seq_id of the gapped sequence. It assumes that the "in" sequence
> > is the same for all of the mappers. This seems safe since otherwise
> > the entire query is kind of weird....
> > There's a patch to todays svn head at:
> > http://shrimp.alerce.com/bioperl/collection-diffs.txt
> > The patch changes Build.PL to include a dependency on Set::IntSpan,
> > CoordinateMapper.t to update the tests, and
> > Bio/CoordinateMapper/Collection.pm for the new code.
> > Who's code would this break.
> > If anyone's relying on the current behaviour re: gaps, what's the
> > situation in which you find it useful?
> > Thanks!
> > g.
So, I'm out of round 'tuit's trying to think of an _easy_ way to do
this using Range/RangeI. I might just not be thinking clearly, and
I've definitely trained myself to solve these kinds of problems with
Set::IntSpan problems so I may just be waving my favorite hammer
There may also be a BioPerl native way to do it using some fancier
object than a RangeI (split locations?).
It's easy to get the set of gaps for each pair, but is there already
something that takes the intersection of a set of non-intersecting
More information about the Bioperl-l