[Bioperl-l] split location problems

Chris Fields cjfields at uiuc.edu
Mon Oct 16 21:07:55 EDT 2006


On Oct 16, 2006, at 5:45 PM, Jason Stajich wrote:

> The whole point of split locations is to represent genes with  
> introns so that is not the "rare" case.
>
> I'm confused where the problem is.  The locations that I get out  
> with to_FTstring on the location object are exactly the same as  
> those input.

The problem is with the a subset of split locations described in the  
bug report.  The following works:

complement(join(2691..4571,4918..5163))

whereas this:

join(complement(4918..5163),complement(2691..4571))

gives this:

complement(join(4918..5163,2691..4571))

which is not syntactically the same.  It should be:

complement(join(2691..4571,4918..5163))

since 'join' implies that the order of the segments to be joined is  
important ('order' and 'bond' do not, I guess).

> I have processed the genbank fungal genomes into GFF3 and have had  
> no problems so I'm confused where you are breaking down.  If I  
> write them out as embl I also get the correct thing.  This is using  
> the CVS version of bioperl from the HEAD.
>
> I've added code to test this to bug 2101 including a C.glabrata  
> chromsome downloaded from genbank.  Perhaps the problem is on the  
> EMBL parsing side, I didn't test that.
>
> On the technical side, I still am not sure I fully know where the  
> strand information should be stored - the top level container or  
> the sub-features.  I'll try and stay up on the discussion if  
> anything has been decided that I should know about.
>
> -jason

Split::strand() sets the sublocations as well, which seems to confuse  
the situation more but it is consistent with LocationI, as Hilmar  
points out.  I'm looking into a few solutions now, including a fix in  
Split::to_FTstring().

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list