[Bioperl-guts-l] [Bug 2101] New: Bio::Location mishandling of join(complement(..), complement(..))

bugzilla-daemon at newportal.open-bio.org bugzilla-daemon at newportal.open-bio.org
Tue Sep 19 11:20:29 EDT 2006


           Summary: Bio::Location mishandling of
           Product: Bioperl
           Version: main-trunk
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: cjfields at uiuc.edu

There are two bugs that are related to how sublocations are treated.  As a
caveat, I had refactored Bio::Factory::FTLocationFactory recently, but all the
below bugs are reproducible with the older code as well.  These seem to be
issues  in Bio::Location::Split.

1) The following set of locations are syntactically the same.  The first
version is most commonly seen:


Passing them through Bio::Factory::FTLocationFactory (older or newer recursive
factory algorithms) and using $loc_obj->to_FT_string():

Before : complement(join(2691..4571,4918..5163))
After  : complement(join(2691..4571,4918..5163))

Before : join(complement(4918..5163),complement(2691..4571))
After  : complement(join(4918..5163,2691..4571))

The returned locations ('After') are not the same, as using join() implies
joining the sequences, one-by-one, from left to right.  These variants would
join the sequences in two different orders.

2) Again, the following two sets of locations are syntactically the same,
however they both have the same remote sublocations (in different positions):

first set:


second set:


Passing them through FTLocationFactory one by one gets these:

first set:

Before : complement(join(2691..4571,ABC1234.5:4918..5163))
After  : join(complement(2691..4571),ABC1234.5:4918..5163)

Before : join(complement(ABC1234.5:4918..5163),complement(2691..4571))
After  : join(complement(ABC1234.5:4918..5163),complement(2691..4571))

second set:

Before : complement(join(ABC1234.5:4918..5163,2691..4571))
After  : join(ABC1234.5:4918..5163,complement(2691..4571))

Before : join(complement(2691..4571),complement(ABC1234.5:4918..5163))
After  : join(complement(2691..4571),complement(ABC1234.5:4918..5163))

Note that the first example for each set does not complement the remote
location and is thus completely incorrect.  The second example for each set,
though, is returned correctly, but the format differs from what occurs when
sublocations are not present, i.e. to_FTstring returns
'join(complement(C..D),complement(A..B)' vs 'complement(join(A..B,C..D))'.  

Both bugs indicate two things: remote locations are treated differently in
joins (reason unknown), and 'flipping' the strand for sublocations is not
accompanied by subsequent reversal of the order of sublocations.  

Finally (notably), the strand feature cannot be set to 1 for any of the
non-complement split locations above.  This occurs post-FTLocationFactory as
explicitly setting the strand in FTLocationFactory doesn't work.  Is this a

It is not known how extensively these bugs affect these particular locations as
they are not too common.  However, the bug is initially silent; when retrieving
the sequence via Bio::SeqFeatureI::spliced_seq(), the correct sequence is
returned, but this is due to presorting of the sequence sublocations which
shouldn't be necessary.  It would be expected, though, that any sequences with
these variants could have problems if passed through SeqIO so wouldn't be
picked up unless running round-trip tests.

I plan on checking sublocations using subseq() as well as check the retrieval
of remote locations.  These may not be ready for the impending 1.5.2 release.

Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Bioperl-guts-l mailing list