[Bioperl-l] reading and writing GFF3

Scott Cain cain at cshl.edu
Fri Jun 16 15:34:13 EDT 2006


That's OK--You added a few items that should be escaped that weren't, so
I added those too.

Thanks,
Scott


On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> Woops, I should have said something about that.  I submitted it before
> I saw that Scott had already done the escaping in CVS.
> 
> Chris Fields wrote: 
> > Scott, 
> > 
> > Looks like Robert also submitted a bug report related to this as well.
> > Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> > 
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> > 
> > Chris
> > 
> >   
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > > Sent: Friday, June 16, 2006 9:18 AM
> > > To: Robert Buels
> > > Cc: bioperl-l at bioperl.org
> > > Subject: Re: [Bioperl-l] reading and writing GFF3
> > > 
> > > Hi Rob,
> > > 
> > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > > but that is actually a good thing.  The tighter constraints results in a
> > > better, more consistent file format.
> > > 
> > > The reason only BSF::Annotated features are writable is that there needs
> > > to be tight control on the 'type' of the feature, to insure that the
> > > type is part of the Sequence Ontology.  It also makes it much easier to
> > > properly write out the attributes in the ninth column, particularly the
> > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > > 
> > > BTG is still usable, but the GFF3 it puts out is actually more
> > > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > > constraints on the type and the terms that are used in the ninth column,
> > > you have to be very careful using it to produce GFF3, by making sure
> > > that your feature objects conform to the standard before BTG tries to
> > > write them out.  (Of course, one way to do that would be to convert your
> > > feature objects to BSF::Annotated objects, but then you could use
> > > BFIO::gff :-)
> > > 
> > > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > > 
> > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > > this is completely valid.  (I even fixed the escaping the of the stray
> > > '=' in 'hind_R=2046'.)  The output I get is this:
> > > 
> > > ##gff-version 3
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > > 918     -       .       Target=Contig151 325 832
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > > 488     -       .       Target=Contig386 1 124
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > > 1718    +       .       Target=Contig358 1 311
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > > 312     -       .       Target=hind_R%3D2046 59 120
> > > 
> > > Scott
> > > 
> > > 
> > > 
> > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > >     
> > > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > > is the 'best' one to use?
> > > > 
> > > > Neither of these is working very well for me.
> > > > 
> > > > My proximate use case is reading in a RepeatMasker report with
> > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > > writing those out to a GFF3 file.
> > > > 
> > > > Bio::Tools::GFF will take these things and write out something that
> > > > closely resembles GFF3, but with Target attributes that don't seem to
> > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > > commas instead of spaces.  I'm attaching a little script that
> > > > illustrates this.
> > > > 
> > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > > points of Bioperl is using polymorphism to make it easy to connect
> > > > things together.  I've attached a little script to illustrate this one
> > > >       
> > > too.
> > >     
> > > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > > Bio::SeqFeature::Annotated objects?
> > > > 
> > > > Thanks in advance.
> > > > 
> > > > Rob
> > > > 
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >       
> > > --
> > > ------------------------------------------------------------------------
> > > Scott Cain, Ph. D.                                         cain at cshl.edu
> > > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > > Cold Spring Harbor Laboratory
> > >     
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >   
> 
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/3dfde2ea/attachment.bin 


More information about the Bioperl-l mailing list