[Bioperl-l] Sim4

Hilmar Lapp hlapp@gmx.net
Tue, 01 Aug 2000 01:18:16 +0200

Chris Mungall wrote:
> Your rewrite of the bioperl sim4 parser looks a lot cleaner. however, I
> notice that you still seem to be ignoring the -> <- == -- symbols at the
> end of each exon. I would have thought that you would want to start a new
> exonset if the direction changed halway through - or record the exon
> direction so that some post-parsing filtering could be done? Maybe I'm
> misunderstanding your code, or sim4 - I'm sure George will correct me if
> I'm speaking rubbish.


the direction of a particular exon is only determined once for all exons
within one set (alignment) - you're right. However, as far as I understood the
algorithm, it tries to align the EST sequence as a whole, so provided this is
correct I cannot imagine how the strand could change within the alignment. If
the 'exons' resulted from individual alignments of segments (sort of HSPs)
then the direction can obviously change. I'm not sure what it does -- maybe I
should go through the paper again :)

Well, a few tests show that if I try to align a sequence to a genomic where it
doesn't match to, I get very short 'exons' (10-20 bp) in varying directions
(based on <- and ->). If I construct a synthetic sequence of its first half in
forward and its second half in reverse complement and align it to the genomic
region where it matched to, then only roughly half of the sequence aligns, and
the direction of the exons do not change.

Still not a definite answer of course. :-)


Hilmar Lapp                                      email: hlapp@gmx.net
NFI Vienna, IFD/Bioinformatics                   phone: +43 1 86634 631
A-1235 Vienna                                      fax: +43 1 86634 727