[Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch)
heikki at sanbi.ac.za
Tue Sep 9 03:50:11 EDT 2008
This is an error from my side. Great that you have a fix already.
My only worry is the number of external dependencies in BioPerl. To limit
these we have recoded number of functionalities into BioPerl-specific modules.
Before you commit the fix, could you see if Bio::RangeI could be used or easily
extended to be used instead of Set::IntSpan?
On Friday 05 September 2008 21:01:35 George Hartzell wrote:
> Hi all,
> Bio::Coordinate::Collection surprised me a bit. At first I thought
> there was a bug, but it's clearly doing what it's supposed to. Now
> I'm wondering if what it's supposed to be doing makes sense in some
> context, or if what I expected would be better functionality.
> t/CoordinateMapper.t sets up the following scenario:
> # Collection
> # 1 5 6 10
> # |---| |---|
> # 1 5 9 15 19
> # pair1 pair2
> Then goes on to do the following query:
> # match more than two
> $pos = Bio::Location::Simple->new (-start => 5, -end => 19 );
> ok $res = $transcribe->map($pos);
> is $res->each_gap, 2;
> is $res->each_match, 2;
> I was surprised to see that there were two gaps, one gene:10-19 and
> one from gene:5-14. Looking at the code, what's really happening is
> that, for the exon1 mapper there's match with gene:5-9 and a gap with
> gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a
> match with gene:15-19. All four Result's just get tossed into the
> return value.
> The result my intuition wants is that there are two matches (gene:5-9
> with exon1 and gene:15-19 with exon2) and a gap (gene:10-14).
> Yes, I guess that I could just synthesize these myself from the result
> in my app.
> It still seems that the current result is a bug though, since there's
> no way of knowing when you're walking through $res->each_Location that
> the first "gap" is with respect to the exon1 mapper and that the
> second "gap" is with respect to the exon2 mapper. The gaps are
> I "fixed" it to work the way I think it should (two matches, one
> gap). I actually extended the test case a bit so that there's a
> multi-base gap, a match, another multibase-gap, another match, then a
> single base gap (just to make sure I got that right...). I had to
> touch up the test file a bit to account for my new test.
> The gaps that I return have a strand of 'undef', which seems to be The
> Right Thing. There's also a bit of funny business where I hang onto
> the seq_id of the gapped sequence. It assumes that the "in" sequence
> is the same for all of the mappers. This seems safe since otherwise
> the entire query is kind of weird....
> There's a patch to todays svn head at:
> The patch changes Build.PL to include a dependency on Set::IntSpan,
> CoordinateMapper.t to update the tests, and
> Bio/CoordinateMapper/Collection.pm for the new code.
> Who's code would this break.
> If anyone's relying on the current behaviour re: gaps, what's the
> situation in which you find it useful?
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
______ _/ _/_____________________________________________________
_/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
_/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho
_/ _/ _/ SANBI, South African National Bioinformatics Institute
_/ _/ _/ University of Western Cape, South Africa
_/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
More information about the Bioperl-l