[Bioperl-l] Proposal for Meta data
jason at bioperl.org
Fri Dec 15 09:28:13 EST 2006
On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>> Hey Chris,
>> My thoughts below.
>>> This could be used to annotate any
>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>>> maybe in a collection (similar to AnnotationCollection). I thought
>>> something like this may be of general use for any PrimarySeq
>>> (quality, structure), alignments like NEXUS and Stockholm,
>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>> However, this also seems to fall into the category of sequence
>>> annotation. So, would it be better to have a set of Bio::Annotation
>>> classes used for this purpose?
>> To me, all meta data is equal. That is, your classic Genbank feature
>> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
>> kinase domain" aren't different in kind even if they are different in
>> As resequencing projects multiply, the ability to create arbitrary
>> meta tags, attach them to different types of objects, and use those
>> tags to link them together will become desirable, if not essential.
>> Keeping a common interface to all of these meta data types would be
>> advantageous, plus new users won't have to determine whether they
>> need to use Bio::Meta objects or Bio::Annotation objects.
>> So I would argue for all of the meta data types to live "under one
>> roof". Which roof isn't as important. Bio::Annotation, since it
>> already exists for today's meta data, seems like a reasonable choice.
>> (assuming Annotation objects are flexible enough to be extended as
>> you propose)
>> There, and no flames or jibes even. :)
> I guess what I want to know is whether there should to be a
> distinction between 'normal' sequence annotation (comments,
> references, and so on) and annotation that could be best described as
> position-specific (like RNA or protein structural annotation). The
> current meta implementation is for sequence data only; I felt it
> would be nice to have a generic implementation that would be
> applicable to any object data.
my stream-of-consciousness for right now:
I was thinking Bio::Annotation is where this should go - that system
doesn't have anything about it that makes it explicitly sequence
related. What we're trying to hammer out here on the Alignment side -
which fits with your RNA example - is have features, basically
SeqFeatures - associated with alignments so columns can be annotated
to cover things like character sets and partitions for phylogenetic
analyses. As for data which annotates non-contiguous things like
RNAstems we may have to be more creative about that or model it with
So currently we've added code so that an Alignment is-a
Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
end, with the goal of being able to capture more of the data that can
be represented in a NEXUS file.
It feels more like a hack than an elegant Meta-data solution, but I
am totally sure whether the data you are thinking about doing at this
point, perhaps I need to spend more time thinking about it.
Or are you worried about the idea of whether the semantic mapping of
the data into features or annotations is confusing users?
More information about the Bioperl-l