[Bioperl-l] Re: LocationI

Hilmar Lapp hlapp@gmx.net
Thu, 18 Jan 2001 11:11:57 -0800

Jason Stajich wrote:
> Interfaces:
> Bio::LocationI -> ISA RangeI
>  Purpose: capture location information - such as in an EMBL/GenBank
>           feature
>         /source 1..345
>  Methods: RangeI methods, and ...? [start/end/strand]
>  Questions:  How is a LocationI object going to be different from the
>              vanilla SeqFeatureI or should be migrate some methods from
>              SeqFeature (start/end/strand) to LocationI and make
>              SeqFeaturesI more about tags (primary/source/has_tag/each_tag)
>              and gff stuff?

In principle I think yes. SeqFeatureI could still keep
start/end/strand and map these to calls into the location object.
Or, SeqFeatureI loses it (i.e., it's no longer mandatory), but for
simplicity SeqFeature::Generic keeps it.

> Bio::ComplexLocationI -> ISA Bio::LocationI
>  Purpose: capture location information for features that are not linear
>          as in an EMBL/Genbank join
>          CDS             join(544..589,688..1032)
>  Methods:
>         - sub_Locations() -> a list of LocationI objects that indicate
>           start/stop boundaries for this object must override overlap,
>           contains, etc from RangeI with since coordinates are not
>           contiguous
> Objects:
>  Bio::SeqFeature::Generic -> ISA Bio::SeqFeatureI, Bio::LocationI
>         add the location() method to this object, the LocationI object
>         returned will be a reference to $self.
> Bio::SeqFeature::Complex -> ISA Bio::SeqFeatureI, Bio::ComplexLocationI
>  Purpose: implementation to handle those join() statements

This is the outline you pretty much follow in the proposal on
Wiki. The point I'm not so happy with is that purely
location-specific issues change the class (type) of a SeqFeature.

> I'm still not clear on what a fuzzy location is supposed to represent
> ie  - does that mean we know that the feature is located somewhere
> in the range, but we don't know the exact start/stop? 

Exactly. At least to my understanding.

> Why can't you treat
> it like real start/stop since we don't have any more information?  Or
> would union/intersection calculations need to behave differently?

Well, biologically you can't, because annotating a sequence with
such a feature without indicating the uncertainty of start and end
is deceptive. For cDNA entries this is sometimes crucial: <1..100
as CDS location means that the entry doesn't even contain the
start of the CDS, and it's totally unclear where that is.


Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757