[Bioperl-l] RE: SeqWithQuality and biosql

Richard HOLLAND hollandr at gis.a-star.edu.sg
Tue Jul 5 23:38:51 EDT 2005

Good point.

To correctly represent compound alphabets in a consistent manner would
require extra tables in BioSQL (version 1.1?). Some kind of alphabet
table with a name and a related table with alphabet ids and ranks to
construct cross products etc.

Why not store the delimiter as an attribute of the alphabet in this
table. That way we can use whatever delimiters we like. I don't think
grouping is necessary - after all we know from the alphabet definition
that there are a fixed number of tokens per symbol and what order they
come in, so we just read the first three tokens to build the first
symbol, and so on.

Richard Holland
Bioinformatics Specialist
GIS extension 8199
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gnf.org] 
> Sent: Wednesday, July 06, 2005 11:30 AM
> To: mark.schreiber at novartis.com
> Cc: Bioperl; Richard HOLLAND
> Subject: Re: SeqWithQuality and biosql
> On Jul 5, 2005, at 7:55 PM, mark.schreiber at novartis.com wrote:
> > I would propose the
> > following for compound alphabets...
> >
> > (aca)(gtc) for codon alphabets.
> > (g17)(t40) for quality type alphabets.
> In your convention wouldn't this need to be
> (g(17))(t(40))
> Otherwise you'd have trouble representing higher-dimensional 
> cross-products unless you alternate chars and digits which would be a 
> useless restriction.
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------

More information about the Bioperl-l mailing list