[Bioperl-l] reading and writing GFF3

Scott Cain cain at cshl.edu
Fri Jun 16 10:18:13 EDT 2006

Hi Rob,

I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
but that is actually a good thing.  The tighter constraints results in a
better, more consistent file format.

The reason only BSF::Annotated features are writable is that there needs
to be tight control on the 'type' of the feature, to insure that the
type is part of the Sequence Ontology.  It also makes it much easier to
properly write out the attributes in the ninth column, particularly the
ones that are 'reserved', like Parent, Dbxref, and Ontology_term.

BTG is still usable, but the GFF3 it puts out is actually more
'GFF3-like'; that is, it looks like GFF3, but because there are no
constraints on the type and the terms that are used in the ninth column,
you have to be very careful using it to produce GFF3, by making sure
that your feature objects conform to the standard before BTG tries to
write them out.  (Of course, one way to do that would be to convert your
feature objects to BSF::Annotated objects, but then you could use
BFIO::gff :-)

[Long pause while scott goes and monkeys with Bio::Tools::GFF]

OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
this is completely valid.  (I even fixed the escaping the of the stray
'=' in 'hind_R=2046'.)  The output I get is this:

##gff-version 3
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120


On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> There is stuff in bioperl for reading and writing GFF3.  There's 
> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
> is the 'best' one to use?
> Neither of these is working very well for me.
> My proximate use case is reading in a RepeatMasker report with 
> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
> writing those out to a GFF3 file.
> Bio::Tools::GFF will take these things and write out something that 
> closely resembles GFF3, but with Target attributes that don't seem to 
> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
> commas instead of spaces.  I'm attaching a little script that 
> illustrates this.
> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
> objects are writeable'.  This seems a bit silly, since one of the whole 
> points of Bioperl is using polymorphism to make it easy to connect 
> things together.  I've attached a little script to illustrate this one too.
> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
> deprecated?  Why does Bio::FeatureIO::gff only accept 
> Bio::SeqFeature::Annotated objects?
> Thanks in advance.
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/572bd98b/attachment.bin 

More information about the Bioperl-l mailing list