[Bioperl-l] reading and writing GFF3

Robert Buels rmb32 at cornell.edu
Tue Jun 20 14:09:38 EDT 2006

Getting to know this code a little better, I notice a couple of little 

1.) my patch attached to bug 2026 draws unnecessary distinctions between 
feature types that use tags, and those that use annotations, since all 
features are now Bio::AnnotatableI's and the *_tags_* methods are 
implemented in AnnotatableI in terms of annotation objects now.  You 
guys should probably just ignore it, since from the sound of it you're 
going to be changing all of this around anyway.  Wish I could be there 
to help and learn more.

2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar 
accessors to use when translating Bio::Annotation::* objects to and from 
scalar tags.  Seems to me, this would be much better accomplished by 
using polymorphism of some sort, probably adding a multipurpose as_tag() 
accessor in Bio::AnnotationI and the objects that implement it, then 
using that in Bio::AnnotatableI instead of %tag2text.  Does this make 
sense, or am I misinterpreting something here?  Reason I've noticed this 
is because I've been wrestling with how to translate  
Bio::Annotation::Target objects to and from scalar tag values, since a 
Target is being represented as an ordered list of 3 or 4 scalar tags in 
old things that were designed to interoperate with gff2, and I can't 
figure out a nice way to do it using the rather inflexible %tag2text 

Sorry to be a pain, just wanted to get that in there before you guys 
start your jam session in Durham.


Scott Cain wrote:
> Hi Hilmar,
> Of course you are right--I was under the influence of a perl module that
> I work with that does something similar, but both of your solutions are
> better.
> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
> look this week.
> As for next week, I plan on spending the day at NESCent on Wednesday
> (though I haven't told Todd or Jeff that I am arriving early yet) just
> to make sure all the details are in place.  I imagine I'll have a fair
> amount of free time to hash this stuff out.  Anyone else who is in town
> (that is, in Durham, NC, USA) is welcome to come draw on a white board
> too. :-)
> Scott
> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>> Hash: SHA1
>> You don't need a new method for this. Instead, support a -feature  
>> argument.
>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>> This should work for any instance of Bio::SeqFeatureI. If it is a  
>> B::SF::Annotated already it is obviously just a deep copy (if copy is  
>> desired - could be another parameter). Otherwise more will be involved.
>> Alternatively, and possibly better, is to write a specialized  
>> SeqFeatureI factory (that would implement  
>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>> 		-type_ontology => $sequence_ontology,
>> 		-source_ontology => $feature_source_ontology,
>> 		-unflatten => 1);
>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>> This is preferable because it separates business logic that isn't  
>> necessarily related into defined units. I.e., the logic necessary to  
>> convert an ordinary feature into a strongly typed one is different  
>> from how to represent a strongly typed feature. IMHO anyway ...
>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
>> started as the result of a discussion thread earlier this (or last?)  
>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
>> though not in concept.
>> Maybe we need to get together again and thrash out a strategy; or a  
>> BOF at the GMOD meeting? I feel this does need a core group of people  
>> who care, hash out a strategy that will also solve the backwards  
>> compatibility problem with the current Bio::SeqFeatureI state-of- 
>> limbo, and allow us to implement the decisions with a few people in a  
>> concentrated effort. This will then also remove the only real large  
>> stumbling block towards a 1.6 release.
>> Maybe we should think about a little pre-GMOD hackathon to clear up  
>> this mess? Scott, you'll be there a day early? I'll be already back  
>> and Jason I believe will still be in town, although he may have other  
>> commitments already. Nonetheless, it shouldn't really take that much  
>> but rather dedicated time, a whiteboard, and a few people who care  
>> thrashing this out and then do it.
>> Thoughts?
>> 	-hilmar
>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>> Rob,
>>> I came to the same conclusion as well; I wrote my response as I was
>>> heading out the door and while I was running errands, I realized the
>>> right thing to do is to write a Bio::SeqFeature::Annotated method  
>>> called
>>> new_from_object, whose usage would be:
>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
>>> ($my_BSFI, %args);
>>> where you would give it a Bio::SeqFeatureI compliant object and try to
>>> create a BSFA like use suggested below.  You could allow passing in  
>>> args
>>> to control how different things are handled, like mapping non-SO types
>>> to SO types.  I'll think about this over the weekend and let you  
>>> know if
>>> brilliance strikes me.
>>> Scott
>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>> Rather than cobble together some ad-hoc solution, I would be  
>>>> interested
>>>> in working on a good solution to this problem, because it seems like
>>>> it's just going to get more common as more people start wanting to  
>>>> write
>>>> GFF3.  What about some code in whatever customarily makes these  
>>>> objects
>>>> (probably BSF::Annotated's new() method?) that could take another  
>>>> type
>>>> of Feature object and attempt to shoehorn its data into a new
>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>> whatever), it could throw() some informative error message.
>>>> Then, people could write straightforward code something like:
>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>     $oldstylefeature->something_else('some other something that  
>>>> needs to
>>>> be changed for compliance');
>>>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>>>> ($oldstylefeature);
>>>>     $gff3_out->write_feature($newfeature);
>>>> }
>>>> Does that sound like a good idea?  I'd be more than willing to  
>>>> implement
>>>> this, since I'm going to need to do this sort of thing with many more
>>>> things than just RepeatMasker.
>>>> Rob
>>>> Scott Cain wrote:
>>>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>>>> wrote
>>>>> before is that I was hoping for divine inspiration for an answer  
>>>>> (or for
>>>>> somebody else to answer, which would have been really great :-)
>>>>> The short answer (and easy one for me to type) is that you will  
>>>>> probably
>>>>> need an ad hoc method to do it, which is the same thing I do when  
>>>>> I need
>>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>>>> don't
>>>>> have any sample code that does this, but if you want to start  
>>>>> working up
>>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>> Scott
>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>> So about that converting ye olde feature objects into
>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>> Scott Cain wrote:
>>>>>>> That's OK--You added a few items that should be escaped that  
>>>>>>> weren't, so
>>>>>>> I added those too.
>>>>>>> Thanks,
>>>>>>> Scott
>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>>>> it before
>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>> Chris Fields wrote:
>>>>>>>>> Scott,
>>>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>>>> as well=
>>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>>> --------
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> -- 
>>> ---------------------------------------------------------------------- 
>>> --
>>> Scott Cain, Ph. D.                                          
>>> cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> - --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>> Version: GnuPG v1.4.2.2 (Darwin)
>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>> ImoAXD/jrbF0gXzSr2CY4tQ=
>> =XfDq
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> ------------------------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu

More information about the Bioperl-l mailing list