[Bioperl-l] Fuzzy Locations and GenBank

Steve Chervitz sac at open-bio.org
Sun Aug 20 23:55:51 EDT 2006

Ah, one of the banes of bioinformatics data modeling is finally being  
laid to rest. Those who have struggled with it (myself included)  
should not let this occasion pass without notice. Here are some  

Check out the captions under photo's #2 and 3 here:

Isn't it fitting, now that the open-bio.org toolkits have systems in  
place to deal with fuzzy locations, the NCBi says, "well, their not  
really used all that much, and so are not worth the trouble". This is  
perhaps something we all knew in our hearts, but nevertheless felt  
compulsion to tackle anyway, right?

The amount of fuzzy location-related cycles the open-bio community  
has collectively burned over the years perhaps isn't for naught:  
There will still be legacy data to deal with, and perhaps other  
feature annotation data models still use them. EMBLxml does. I know  
DAS/2 does not and has no plans to, and looks like GAME XML also does  
not. Anyone else?

I imagine EMBL and DDBJ will follow suit in banishing fuzzy locations  
as well. Anyone know?


On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:

> Great, the fewer fuzzy locations the better. -hilmar
> On Aug 19, 2006, at 12:03 AM, Chris Fields wrote:
>> Don't know how much this will affect Bio::Location::Fuzzy, but I
>> thought it might be worth a heads-up in case something pops up:
>>  From the latest GenBank release (154.0):
>> ...
>> 1.4.6 Feature location syntax X.Y to be discontinued
>>    The Feature Table currently supports feature locations of the
>> format X.Y, to represent a base position which is greater or
>> equal to X, and less than or equal to Y. For example:
>> 	misc_feature    1.10..20
>> 	misc_feature    join(100..150,200.210..250)
>>    In the first example, the misc_feature starts somewhere between
>> bases 1 and 10 (inclusive), and ends at basepair 20. In the second,
>> the 51 bases from 100..150 are joined together with a second basepair
>> interval, which could be anywhere from 200..250 to 210..250 .
>>    Although this syntax seems like a reasonable way to capture an
>> uncertain interval, it is used for features on a vanishingly small
>> number of sequence records, most database submission mechanisms
>> don't support it, and the meaning of its use in a join() context
>> is not entirely clear.
>>    As of October 2006, this type of location will no longer be
>> supported. Those records with features which utilize X.Y locations
>> will be reviewed and converted to a non-uncertain format prior to
>> that date.
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list