[Bioperl-l] Bio::Location::Split question

Hilmar Lapp hlapp at gmx.net
Mon Sep 18 15:39:05 EDT 2006


I'm not sure what you're suggesting.

Are you suggesting that the examples are not identical in resulting  
DNA sequence (because in my book they are, because the order of  
segments is reversed in the second example).

Or are you suggesting that there is a bug in how BioPerl resolves  
split locations?

Or both?

	-hilmar

On Sep 18, 2006, at 11:02 AM, Chris Fields wrote:

> This is a general question about how locations are described for split
> locations, so whoever has an opinion, please chip in.  This is  
> particularly
> pertinent to GenBank/EMBL/swiss formats.  Okay, stick with me here...
>
> A pretty interesting question was raised while I was working on a  
> bug (bug
> 1953), which deals with split location data with the following  
> formats:
>
> join(complement(1..100),complement(201..300),complement(401-500))
>
> complement(join(1..100,201..300,401..500))
>
> GenBank acc #AL137247 has examples of both, if you want a real  
> example.
>
> According to BioPerl these are syntactically the same (look at the  
> last few
> tests in LocationFactory.t).  However, according to GenBank (and the
> rationale outlined in bug 1953), these are actually quite different.
>
> Acc. to the GenBank/EMBL/DDBJ feature table definition, the use of the
> operator 'join' entails that the segments in the following  
> parentheses are
> joined in the order presented ('placed end-to-end'), whereas the  
> use of
> 'complement' uses the complementary strand of the segment in  
> parentheses.
> So, the operator tells one how to treat the sequence data using the
> locations shown.
>
> Here are examples from the definition:
>
> ...
>
> complement(join(2691..4571,4918..5163))
>                           Joins regions 2691 to 4571 and 4918 to  
> 5163, then
>                           complements the joined segments (the  
> feature is on
> the
>                           strand complementary to the presented  
> strand)
>
> join(complement(4918..5163),complement(2691..4571))
> 		       Complements regions 4918 to 5163 and 2691 to 4571,
> then
>                           joins the complemented segments (the  
> feature is on
> the
>                           strand complementary to the presented  
> strand)
> ...
>
> Using this rational, substituting in letters for clarity and lower  
> case to
> indicate the complement strand:
>
> Location #1 : join(complement(A..B),complement(C..D),complement(E..F))
>
> would be:
>
> join(b..a,d..c,f..e)
>
> and the following:
>
> Location # 2: complement(join(A..B,C..D,E..F)
>
> would be:
>
> join(f..e,d..c,b..a)
>
> The current behavior of Bio::Location::Split propogates the strand
> information (flips) to the sublocations w/o resorting them.  We  
> could sort
> them, but wouldn't it be much simpler to not propogate strand  
> changes at
> all?  Seems we're making it more complicated than it actually is.
>
> Thoughts?
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







More information about the Bioperl-l mailing list