[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl
cjfields at illinois.edu
Tue Sep 9 14:24:49 EDT 2008
Is there any particular reason we don't treat this similarly to the
way BioPerl does, which is to simply treat the origin-overlapping
feature as a split location? GenBank treats this similarly. For an
faux example, the bug I just fixed for bugzilla has one:
An actual GenBank case is the Sulfolobus solfataricus genome
(NC_002754), and I'm sure Jim could come up with more. The only
caveat is whether we should represent this
As for multiple revolutions, I'm not sure the hand-wringing about
specifics is worth it unless we have explicit workable examples to
test against (preferably examples which would potentially pop up), but
Lincoln's proposal sounds fine.
On Sep 9, 2008, at 11:05 AM, Jim Hu wrote:
> Hi Aaron,
> I was thinking this would be handled by making the end=parent
> feature length x 2 + end coord. end/parent length = number of times
> crosses origin.
> On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote:
>> How can you handle features that may cross the origin more than once?
>> The modulus, though simple, seems to be only half the solution. It
>> also makes it difficult to place features in the genome "by eye"
>> (having to do the modulus subtraction in my head), or in
>> sorting/filtering operations.
>> I have an alternative that I wondered if you considered: allow the
>> start/end to have an additional "circular revolution" prefix:
>> a typical range tuple like: 100 200 -
>> is thus shorthand for: 0:100 0:200 -
>> (i.e. both the 100 and 200 are in the same "revolution" around the
>> and is then distinguishable from an "around the genome + 100"
>> feature of:
>> 1:100 0:200 -
>> Just an alternative to consider (if you haven't already). I'm not
>> wedded to the syntax, but I wouldn't want to see new columns in GFF
>> just for this. Essentially, what you want is some form of compound
>> polar coordinates, it seems.
>> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
>>> In discussions with GMOD about Gbrowse, we've come up with a
>>> proposal for
>>> handling circular genomes and features that cross the origin in such
>>> genomes. This applies to lots of prokaryotic and viral genomes,
>>> and might
>>> be valuable for some ways of representing terminally redundant
>>> 1) Keep the requirement that start < end
>>> 2) allow end > parent feature length
>>> 3) parent feature gets an is_circular boolean
>>> 4) use modular arithmetic to calculate the real position of end on
>>> parent feature.
>>> We'd like to do this in a way that will be consistent with Chado
>>> and BioPerl
>>> representation of features as much as possible (realizing that
>>> there is the
>>> usual interbase or not coordinate issue). What do people think?
>>> Lincoln is
>>> on board for modifying the GFF3 spec.
>>> Jim Hu
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> Build the coolest Linux based applications with Moblin SDK & win
>>> Grand prize is a trip for two to an Open Source event anywhere in
>>> the world
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
> Jim Hu
> Associate Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign
More information about the Bioperl-l