[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl

Aaron Mackey ajmackey at gmail.com
Tue Sep 9 14:48:12 EDT 2008


Right, the modulus calculation continues to work, but for instance,
what'll happen when I now ask Gbrowse (or Ensembl) to show me
positions 50..260?  Will it show me 50 .. 60, 1:100, or "unroll" the
genome twice from 50..260 (that'd be a pretty cute trick, by the way!)

You're (re)using simple arithmetic to compress a compound coordinate
into a single-valued coordinate (which I realize can be trivially
packed and unpacked by software), but I worry about the downstream
consequences of software having to always remember that the
coordinates given may have to be unpacked or not, and not being able
to immediately identify whether "260" is a real or compound
coordinate.

To say it another way, I'm happy (that is, don't care much) whether
Chado or any other underlying data storage uses such compound
coordinates, because only Chado-reliant tools will need to care; but I
do worry about GFF3 as a (relatively) simple exchange format having
that kind of silent bug-causing complexity.  I'd much rather see GFF
be syntactically explicit, and not quite so cleverly implicit.

Just one GFF user's two cents, thanks for listening,

-Aaron

On Tue, Sep 9, 2008 at 1:52 PM, Lincoln Stein <lincoln.stein at gmail.com> wrote:
> It seems to me that the proposed modulus syntax handles multiple
> revolutions. Consider a 100 bp genome (to make it simple) and a feature that
> starts at 50, goes around twice, and ends at position 60:
>
>   start = 50
>   end  = 260
>
> length = end - start + 1
> revolutions = int (length/genome)
> stop position = length % genome + 1
>
> Lincoln
>
> On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey <ajmackey at gmail.com> wrote:
>>
>> How can you handle features that may cross the origin more than once?
>> The modulus, though simple, seems to be only half the solution.  It
>> also makes it difficult to place features in the genome "by eye"
>> (having to do the modulus subtraction in my head), or in
>> sorting/filtering operations.
>>
>> I have an alternative that I wondered if you considered: allow the
>> start/end to have an additional "circular revolution" prefix:
>>
>> a typical range tuple like: 100 200 -
>> is thus shorthand for: 0:100 0:200 -
>> (i.e. both the 100 and 200 are in the same "revolution" around the genome)
>>
>> and is then distinguishable from an "around the genome + 100" feature of:
>> 1:100 0:200 -
>>
>> Just an alternative to consider (if you haven't already).  I'm not
>> wedded to the syntax, but I wouldn't want to see new columns in GFF
>> just for this.  Essentially, what you want is some form of compound
>> polar coordinates, it seems.
>>
>> -Aaron
>>
>> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
>> > In discussions with GMOD about Gbrowse, we've come up with a proposal
>> > for
>> > handling circular genomes and features that cross the origin in such
>> > genomes.  This applies to lots of prokaryotic and viral genomes, and
>> > might
>> > be valuable for some ways of representing terminally redundant linear
>> > genomes.
>> > 1) Keep the requirement that start < end
>> > 2) allow end > parent feature length
>> > 3) parent feature gets an is_circular boolean
>> > 4) use modular arithmetic to calculate the real position of end on the
>> > parent feature.
>> > We'd like to do this in a way that will be consistent with Chado and
>> > BioPerl
>> > representation of features as much as possible (realizing that there is
>> > the
>> > usual interbase or not coordinate issue).  What do people think?
>> >  Lincoln is
>> > on board for modifying the GFF3 spec.
>> > Thanks!
>> > Jim Hu
>> >
>> > =====================================
>> >
>> > Jim Hu
>> >
>> > Associate Professor
>> >
>> > Dept. of Biochemistry and Biophysics
>> >
>> > 2128 TAMU
>> >
>> > Texas A&M Univ.
>> >
>> > College Station, TX 77843-2128
>> >
>> > 979-862-4054
>> >
>> >
>> >
>> > -------------------------------------------------------------------------
>> > This SF.Net email is sponsored by the Moblin Your Move Developer's
>> > challenge
>> > Build the coolest Linux based applications with Moblin SDK & win great
>> > prizes
>> > Grand prize is a trip for two to an Open Source event anywhere in the
>> > world
>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > Gmod-schema at lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> >
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> --
> Lincoln D. Stein
>
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Stacey Quinn <Stacey.Quinn at oicr.on.ca>
>
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724 USA
> (516) 367-8380
> Assistant: Sandra Michelsen <michelse at cshl.edu>
>


More information about the Bioperl-l mailing list