[Bioperl-l] RNA folding

Chris Fields cjfields at uiuc.edu
Wed Feb 7 17:15:44 EST 2007

On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote:

> Thanks Chris.
> Storing the interaction data as a hash according to an ontology and  
> using
> an extended bracket notation as the string representation seems to  
> make
> sense, but I'm still unsure how this is supposed to be
> attached to the Seq objects. You reckon it should be an AnnotationI?

As long as it describes everything in the object and that there is a  
reasonable way of textually representing the data, I think you can  
attach anything as annotation.  A recent example is the addition of  
trees as annotation.  Also, Annotation can be used to describe  
alignments (such as the structure consensus string in Rfam  
alignments), or added to SeqFeatures.  The class just needs to  
implement AnnotatableI.

> I'm not sure I understand the distinction between annotations and
> features. From the docs I got the impression that Features were like
> annotation on bits of sequences and had a reference to the sequence to
> which they belong, whereas annotations don't. If that's the case  
> though,
> why would RNA structure be an annotation rather than a feature? If  
> not,
> what is the distinction between them? Are the positional Annotation
> subclasses you're developing intended to replace features? Have I  
> got the
> wrong end of the stick entirely?
> Cheers,
> Cass

The key distinction between seqfeatures and annotations is that  
annotations are normally associated with the entire sequence record,  
while seqfeatures normally describe a part of the sequence (and thus  
have a location on the sequence).  There are a few exceptions, but in  
general that's that case.  The HOWTO gives a bit more background:


Using annotations or seqfeatures in a case like this may be  
completely dependent on one's point of view.  For instance, one  
implementation I had considered was adding an interface to Bio::Seq  
which would allow Seq objects to also have Bio::Structure objects/  
since my view is that any sequence could (optionally) have a  
structure associated with it.  However, I reasoned that a sequence  
could actually have multiple structures (RNA, ssDNA, and protein can  
have several alternative folds or different folding pathways, for  
instance).   Instead of splitting up each structure into individual  
seqfeatures (where each which would have to be tagged with the  
relevant structure and score info), I could have one class encompass  
all of that data in a reasonable way.  Hence I used Annotation.

BTW, this isn't meant to replace features in any way.  It would be  
primarily used to describe (1) a sequence as a whole, such as a tRNA  
sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in  
a genome sequence, or (3) a conserved structure in an alignment, such  
as Rfam stockholm output.

I'll add that the option of splitting the data into seqfeatures isn't  
ruled out.  It would be a matter of using a helper method, maybe in  
SeqUtils or directly in Annotation::Meta or whatever I end up calling  
it.  I plan on adding something along those lines at some point.


More information about the Bioperl-l mailing list