[Bioperl-l] Proposal for Meta data
cjfields at uiuc.edu
Mon Dec 18 17:33:34 EST 2006
On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote:
> Reading the discussion, I think it is time to draw some guidelines.
> 1. Base the Meta implementation to a real use cases.
> MSA is a good example.
AlignIO::stockholm is where I'll initially test it out.
> 2. Allow generalisations
> If you can see an other implementation of the same idea that can
> be merged
> with the first do it but do not hurt yourself if you can not.
> The most difficult question is how to separate case-specific
> attributes that
> are best implemented by subclassing with additional methods from
> truly widely
> variable meta data that is best done as a parallel track meta
> holding class.
I would probably start with a general Bio::Annotation::MetaI abstract
class, which supplements AnnotationI with general meta-specific
methods (meta, meta_text, named_meta, etc)? Implement this in
whatever way one wanted (RNA structure as strings, quality data as
arrays, etc) under the constraints of the interface description.
Multiple meta objects, potentially of mixed data types, could be
added in an AnnotationCollection along with other Bio::Annotation
data, or stored in a nested meta-specific AnnotationCollection object
(I favor the former as it's simpler). So you could have an
alignment, sequence, seqfeature (anything that is AnnotatableI) with
a regular AnnotationCollection also containing possibly multiple meta
objects, each meta object also containing possibly more than one set
of meta data.
The key issue I have is whether or not to constrain these to
describing positional data, similar to Bio::Seq::Meta, by ensuring
that the data is_flush(), etc. My current inclination is 'no', and
to have a separate abstract class which describes these methods,
implementing those separately.
> The problem I see with undefined, totally open meta annotation, is
> that if you
> can put anything in there, it is also totally confusing to a user.
> If you can
> put anything in, how do you know what to get get out and know that
> it is
> That leads to the the third guideline:
> 3. Use separate meta classes only when there are several different
> ways of
> encoding data that is present in large numbers *and* when you are
> to be assessing the data computationally rather than just checking
> if an
> attribute is there.
The initial use case for this would be simple data strings for
alignment data. I already have a partial implementation in place for
stockholm using Bio::Seq::Meta (which led me to this proposal!). I
like Chris M.'s idea of ensuring that meta implementations use some
sort of formalized ontology, but I'll probably start out very simple
and work up from there.
More information about the Bioperl-l