[Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split

Hilmar Lapp lapp@gnf.org
Thu, 25 Jan 2001 13:18:44 -0800

Bringing this (in part) back to the list, too.

Matthew Pocock wrote:
> Jason Stajich wrote:
> > Questions:
> > 1. Do we need to override the famous pocock RangeI contains/overlaps
> >    methods for a Split location to take into account where the pieces
> >    of the contained LocationI are?
> >    Or do we take the easy route and just use min_start/max_end?  I think
> >    that right now start/end return 0 for a split location since they are
> >    not explictly set, should they default to delegating to
> >    min_start/max_start?  I think so.
> >
> Originaly the BioJava Locations just used min/max for all location
> operators - this turned out to be a *very bad thing* under most
> conditions. You are better off having operators that use split locations
> return split locations - also, the union of two ranges that don't
> overlap is the split location containing both ranges. It is more work to
> set up, but it pays off & if you don't do it you get confusing bugs later.
> >    What about in Fuzzy, do we want to throw exceptions or do we just use
> >    the best information we have and do some logic and coordinate
> >    gymnastics to try and return a reasonable value or else throw an
> >    exception?
> My gut says to return the inner-most coordinate that is known but
> provide API to get the full fuzzy coordinates out - so
> full loc           -> start..end : minStart..maxEnd
> <50..100>          -> 50..100    : -INF..+INF
> (78.90)..(100.107) -> 90..100    : 78..107

I think I am much more in favor of returning the outer-most
coordinates as the default policy. David, Mark? I'm also not sure
whether INF or NaN are good return values in perl (i.e., can you test
for INF or NaN by numeric comparison? I figured that e.g. you can't
obtain NaN by sqrt(-1), as would be the result in C).


> >
> > As for deep SplitLocation (ie SplitLocation containing Location objects
> > that are SplitLocations), this will work in a very gross way just like
> > perl flattens arrays, except I don't plan to simplify the join(...join())
> > code into a single join() unless you guys think its worth it.  It wouldn't
> > be hard, just let perl collapse the arrays...
> Should work - there is the pathalogical case where an index is included
> via two paths. CompoundLocation in BioJava does all the collapsing at
> constructor time. All our Location objects are immutable, so once
> constructed, you can't change their contained indexes in any way. The
> hierachy of Location containment is never exposed to the user - we may
> have to expose it if we provide a full fuzzy-location editor, though.
> Now I come to think of it, I have seen Embl CDS entries with internal
> exons that have < or > operators on them. Pants.

Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757