[Bioperl-l] Proposal for Meta data

Chris Fields cjfields at uiuc.edu
Mon Dec 18 17:33:34 EST 2006

On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote:

> Reading the discussion, I think it is time to draw some guidelines.
> 1. Base the Meta implementation to a real use cases.
>    MSA is a good example.

AlignIO::stockholm is where I'll initially test it out.

> 2. Allow generalisations
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.

I agree.

> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.

I would probably start with a general Bio::Annotation::MetaI abstract  
class, which supplements AnnotationI with general meta-specific  
methods (meta, meta_text, named_meta, etc)?  Implement this in  
whatever way one wanted (RNA structure as strings, quality data as  
arrays, etc) under the constraints of the interface description.

Multiple meta objects, potentially of mixed data types, could be  
added in an AnnotationCollection along with other Bio::Annotation  
data, or stored in a nested meta-specific AnnotationCollection object  
(I favor the former as it's simpler).  So you could have an  
alignment, sequence, seqfeature (anything that is AnnotatableI) with  
a regular AnnotationCollection also containing possibly multiple meta  
objects, each meta object also containing possibly more than one set  
of meta data.

The key issue I have is whether or not to constrain these to  
describing positional data, similar to Bio::Seq::Meta, by ensuring  
that the data is_flush(), etc.  My current inclination is 'no', and  
to have a separate abstract class which describes these methods,  
implementing those separately.

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
> That leads to the the third guideline:
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
> 	-Heikki

The initial use case for this would be simple data strings for  
alignment data.  I already have a partial implementation in place for  
stockholm using Bio::Seq::Meta (which led me to this proposal!).  I  
like Chris M.'s idea of ensuring that meta implementations use some  
sort of formalized ontology, but I'll probably start out very simple  
and work up from there.


More information about the Bioperl-l mailing list