[Bioperl-l] reading and writing GFF3

Robert Buels rmb32 at cornell.edu
Fri Jun 16 15:34:16 EDT 2006


So about that converting ye olde feature objects into 
Bio::SeqFeature::Annotated objects.  How do I do it?


Scott Cain wrote:
> That's OK--You added a few items that should be escaped that weren't, so
> I added those too.
>
> Thanks,
> Scott
>
>
> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>   
>> Woops, I should have said something about that.  I submitted it before
>> I saw that Scott had already done the escaping in CVS.
>>
>> Chris Fields wrote: 
>>     
>>> Scott, 
>>>
>>> Looks like Robert also submitted a bug report related to this as well.
>>> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>>>
>>> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>>>
>>> Chris
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>>>> Sent: Friday, June 16, 2006 9:18 AM
>>>> To: Robert Buels
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>>>
>>>> Hi Rob,
>>>>
>>>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>>>> but that is actually a good thing.  The tighter constraints results in a
>>>> better, more consistent file format.
>>>>
>>>> The reason only BSF::Annotated features are writable is that there needs
>>>> to be tight control on the 'type' of the feature, to insure that the
>>>> type is part of the Sequence Ontology.  It also makes it much easier to
>>>> properly write out the attributes in the ninth column, particularly the
>>>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>>>
>>>> BTG is still usable, but the GFF3 it puts out is actually more
>>>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>>>> constraints on the type and the terms that are used in the ninth column,
>>>> you have to be very careful using it to produce GFF3, by making sure
>>>> that your feature objects conform to the standard before BTG tries to
>>>> write them out.  (Of course, one way to do that would be to convert your
>>>> feature objects to BSF::Annotated objects, but then you could use
>>>> BFIO::gff :-)
>>>>
>>>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>>>
>>>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>>>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>>>> this is completely valid.  (I even fixed the escaping the of the stray
>>>> '=' in 'hind_R=2046'.)  The output I get is this:
>>>>
>>>> ##gff-version 3
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>>>> 918     -       .       Target=Contig151 325 832
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>>>> 488     -       .       Target=Contig386 1 124
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>>>> 1718    +       .       Target=Contig358 1 311
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>>>> 312     -       .       Target=hind_R%3D2046 59 120
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>>>     
>>>>         
>>>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>>>> is the 'best' one to use?
>>>>>
>>>>> Neither of these is working very well for me.
>>>>>
>>>>> My proximate use case is reading in a RepeatMasker report with
>>>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>>>> writing those out to a GFF3 file.
>>>>>
>>>>> Bio::Tools::GFF will take these things and write out something that
>>>>> closely resembles GFF3, but with Target attributes that don't seem to
>>>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>>>> commas instead of spaces.  I'm attaching a little script that
>>>>> illustrates this.
>>>>>
>>>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>>>> points of Bioperl is using polymorphism to make it easy to connect
>>>>> things together.  I've attached a little script to illustrate this one
>>>>>       
>>>>>           
>>>> too.
>>>>     
>>>>         
>>>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>>>> Bio::SeqFeature::Annotated objects?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Rob
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>     
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>   
>>>       
>> -- 
>> Robert Buels
>> SGN Bioinformatics Analyst
>> 252A Emerson Hall, Cornell University
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu




More information about the Bioperl-l mailing list