[Bioperl-l] Fuzzy Locations and GenBank

Hilmar Lapp hlapp at gmx.net
Mon Aug 21 16:04:45 EDT 2006


I'm not sure. It sounded more like it was the most rare variant.

Aside from Genbank, Swissprot used to use the sort of <1..30 location  
a lot for feature annotation when they only had a partial peptide,  
i.e., primarily in TREMBL. I'm not sure whether that's changed in  
Uniprot.

Also note that the location for insertion features uses (or no  
longer?) the 10.11 notation (for an insertion between bases 10 and  
11). In Bioperl, that's a fuzzy location too. In this case though,  
you can I guess blame it on Bioperl for using the wrong coordinate  
system, as in reality there's nothing fuzzy about where the insertion  
is.

	-hilmar

On Aug 21, 2006, at 3:18 PM, Lincoln Stein wrote:

> This was the most common variant, right?
>
> Lincoln
>
> On 8/21/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> Well, they're actually not dead yet. Just one variant died. I'm
> hoping though that this is just a step on the road that indeed ends
> in their death.
>
>         -hilmar
>
> On Aug 21, 2006, at 1:34 PM, Lincoln Stein wrote:
>
> > I am tempted to start dancing around my office singing "Ding dong
> > the fuzzy
> > feature is dead!" Break out the champagne!!
> >
> > Lincoln
> >
> > On 8/21/06, Chris Fields < cjfields at uiuc.edu> wrote:
> >>
> >> Steve
> >>
> >> There is this the EMBL Release 87 notes:
> >>
> >>
> >> http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/
> >> relnotes.html
> >>
> >> ..
> >> 2 CHANGES IN THIS RELEASE
> >>
> >> 2.1 Changes to the Feature Table Document: Chapter 3.5 "Location"
> >>
> >> The use of range (.) descriptor within location spans is no longer
> >> legal.
> >> ..
> >>
> >> So, yes, looks like EMBL is doing the same thing.  I am guessing
> >> DDBJ is
> >> also.
> >>
> >> I didn't see anything in the recent revision for the INSDSeqXML
> >> DTD, but I
> >> don't think a change in the DTD would be needed to accommodate the
> >> removal
> >> of 'fuzzy' locations of X.Y type, unless the DTD has specific
> >> rules on how
> >> to format fuzzy location data.  Same for the other formats
> >> (EMBLXML, etc)
> >> as
> >> the change is rather small (but very significant).
> >>
> >> I'm guessing changes to other formats (game, etc) that rely on
> >> GenBank/EMBL
> >> will occur if they specifically deal with these in some way.
> >>
> >> It is nice to know that that BioPerl won't be seriously affected
> >> by this.
> >> As you noted, we'll have to keep X.Y fuzzy functionality around to
> >> accommodate legacy data, but should we add warnings for this?
> >>
> >> Chris
> >>
> >>
> >>> -----Original Message-----
> >>> From: Steve Chervitz [mailto:sac at open-bio.org]
> >>> Sent: Sunday, August 20, 2006 10:56 PM
> >>> To: Hilmar Lapp
> >>> Cc: Chris Fields; Bioperl List
> >>> Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
> >>>
> >>> Ah, one of the banes of bioinformatics data modeling is finally
> >>> being
> >>> laid to rest. Those who have struggled with it (myself included)
> >>> should not let this occasion pass without notice. Here are some
> >>> reflections.
> >>>
> >>> Check out the captions under photo's #2 and 3 here:
> >>> http://gallery.open-bio.org/gallery2/v/hackathon2002/dagphotos/ ?
> >>> g2_page=7
> >>>
> >>> Isn't it fitting, now that the open-bio.org toolkits have  
> systems in
> >>> place to deal with fuzzy locations, the NCBi says, "well, their  
> not
> >>> really used all that much, and so are not worth the trouble".
> >>> This is
> >>> perhaps something we all knew in our hearts, but nevertheless felt
> >>> compulsion to tackle anyway, right?
> >>>
> >>> The amount of fuzzy location-related cycles the open-bio community
> >>> has collectively burned over the years perhaps isn't for naught:
> >>> There will still be legacy data to deal with, and perhaps other
> >>> feature annotation data models still use them. EMBLxml does. I  
> know
> >>> DAS/2 does not and has no plans to, and looks like GAME XML also
> >>> does
> >>> not. Anyone else?
> >>>
> >>> I imagine EMBL and DDBJ will follow suit in banishing fuzzy
> >>> locations
> >>> as well. Anyone know?
> >>>
> >>> Steve
> >>>
> >>> On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:
> >>>
> >>>> Great, the fewer fuzzy locations the better. -hilmar
> >>>>
> >>>> On Aug 19, 2006, at 12:03 AM, Chris Fields wrote:
> >>>>
> >>>>> Don't know how much this will affect Bio::Location::Fuzzy, but I
> >>>>> thought it might be worth a heads-up in case something pops up:
> >>>>>
> >>>>>  From the latest GenBank release (154.0):
> >>>>>
> >>>>> ...
> >>>>>
> >>>>> 1.4.6 Feature location syntax X.Y to be discontinued
> >>>>>
> >>>>>    The Feature Table currently supports feature locations of the
> >>>>> format X.Y, to represent a base position which is greater or
> >>>>> equal to X, and less than or equal to Y. For example:
> >>>>>
> >>>>>    misc_feature    1.10..20
> >>>>>    misc_feature    join(100..150, 200.210..250)
> >>>>>
> >>>>>    In the first example, the misc_feature starts somewhere  
> between
> >>>>> bases 1 and 10 (inclusive), and ends at basepair 20. In the
> >>>>> second,
> >>>>> the 51 bases from 100..150 are joined together with a second
> >>>>> basepair
> >>>>> interval, which could be anywhere from 200..250 to 210..250 .
> >>>>>
> >>>>>    Although this syntax seems like a reasonable way to  
> capture an
> >>>>> uncertain interval, it is used for features on a vanishingly  
> small
> >>>>> number of sequence records, most database submission mechanisms
> >>>>> don't support it, and the meaning of its use in a join() context
> >>>>> is not entirely clear.
> >>>>>
> >>>>>    As of October 2006, this type of location will no longer be
> >>>>> supported. Those records with features which utilize X.Y  
> locations
> >>>>> will be reviewed and converted to a non-uncertain format  
> prior to
> >>>>> that date.
> >>>>>
> >>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







More information about the Bioperl-l mailing list