Bioperl: A greedy consensus builder

Ian Holmes
Wed, 18 Aug 1999 07:38:58 -0600 (MDT)

On Wed, 4 Aug 1999, Mike Cariaso wrote:

> To scratch an itch I've built a small object which ISA Bio::UnivAln.
> It has a single method 'greedy' which constucts a consensus sequence using a
> somewhat different method. The approach does ignores alignment gaps that are
> before (or after) the first (last) non-gap of a row. An example may explain it
> more clearly.
> Sample alignment:
> row6  :-------------------------------------------CGCTCGCCTCGCTCCTC---CCTCGCTC
> So for the above alignment the current technique will call the first 20 or so
> bases as gaps since that is the most common char. The greedy approach assumes
> that this area is outside the known region of those rows, and ignores the gaps
> there. This seems useful when working with small partial fragments.
> If there is interest I'll be happy (honored, actually) to contribute it to
> bio.perl. 
> The interface has the threshold param as well as another optional one to specify
> the minimum number of rows necessary to do a base call. And if you noticed any
> errors in the alignment, its totally bogus data, so the mistake is mine.

This is not really a comment on your algorithm since I'm sure there are
many heuristics out there appropriate for different situations - which is
why I'd like to propose a move towards a common format.

I'm afraid I don't really know how (if at all) consensus sequence is
handled in bioperl at the minute, but from a theorists viewpoint I'd argue
for HMMER-style HMMs as the ideal representation for this, the main reason
being that HMMs handle gaps more consistently than treating them as a 21st
amino acid.

Ungapped profiles can be mapped onto HMMER profiles with nearly no effort
but there is an argument for having a separate data structure for these,
since they can be fed directly into many sequence analysis algorithms
without change, and are convenient for representing e.g. ambiguous

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: