[Bioperl-l] RNA folding

Caroline Johnston johnston at biochem.ucl.ac.uk
Wed Feb 7 13:56:52 EST 2007

Thanks Chris.

Storing the interaction data as a hash according to an ontology and using
an extended bracket notation as the string representation seems to make
sense, but I'm still unsure how this is supposed to be
attached to the Seq objects. You reckon it should be an AnnotationI?

I'm not sure I understand the distinction between annotations and
features. From the docs I got the impression that Features were like
annotation on bits of sequences and had a reference to the sequence to
which they belong, whereas annotations don't. If that's the case though,
why would RNA structure be an annotation rather than a feature? If not,
what is the distinction between them? Are the positional Annotation
subclasses you're developing intended to replace features? Have I got the
wrong end of the stick entirely?


On Tue, 6 Feb 2007, Chris Fields wrote:

> Actually, the only RNA tool wrappers I have made are ones for ERPIN,
> RNAMotif, and Infernal (the only one in bioperl-run CVS at this time
> is RNAMotif).  I am planning on writing up wrappers for Vienna,
> UNAFold, and a few others but haven't really started in.  Here's
> where I'm at right now...
> I am writing up a new set of AnnotationI classes which positionally
> describe data (Meta) which I hope will help deal with this stuff.
> These would be similar in nature to Heikki's Bio::Seq::Meta classes:
> http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html
> I would use a regular Bio::SeqI and store the structural data and
> anything else (such as energy calculations, etc) as Annotation
> objects in an AnnotationCollection, and then write up a series of
> SeqIO modules to get data into/out of the designated structure
> formats, like UNAfold ct, RNAML, and so on.  Each sequence would then
> be capable of holding more than one structural Annotation (i.e. could
> represent different folding pathways, alternative RNA folds, and so on).
> At this point I represent the data as an array of hashes where $array
> [0] is nt 1 and the hash keys indicate the type of interaction, base
> interacted with, etc.  The text representation would be as simple
> Eddy WUSS (Rfam-like) format by default, which is capable of
> representing some complex data (pseudoknots, for instance), is
> compact, and is documented (via the Infernal manual).  Tags will
> probably switch to more ontologically relevant terms (probably from
> RNAML or RNA Ontology), but in general it is something like this:
> [
>   {'interaction' => 'WC',
>     'base'  => 24},
>   {'interaction' => 'WC',
>     'base'  => 23},
>   {'interaction' => 'SS'},
> ...
> ]
> In this implementation every seq position would have some kind of
> interaction designation, though that's open for debate as it could
> just be simple text or undef for single-stranded regions.
> This is also scalable based on complexity of the data: if one wanted
> to add tert/quaternary interactions, location, base modifications,
> remote sequence interactions, etc., extra key/value pairs could be
> used.  Comversely, if one only wanted sec structure (for drawing RNA
> structures, for example), then only that data would be parsed.
> If you (or anyone listening) have any suggestions I would greatly
> appreciate them.
> chris

More information about the Bioperl-l mailing list