[Bioperl-l] reading and writing GFF3
rmb32 at cornell.edu
Fri Jun 16 15:30:23 EDT 2006
Woops, I should have said something about that. I submitted it before I
saw that Scott had already done the escaping in CVS.
Chris Fields wrote:
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)? I'm still GFF3-illiterate.
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>> Sent: Friday, June 16, 2006 9:18 AM
>> To: Robert Buels
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] reading and writing GFF3
>> Hi Rob,
>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>> but that is actually a good thing. The tighter constraints results in a
>> better, more consistent file format.
>> The reason only BSF::Annotated features are writable is that there needs
>> to be tight control on the 'type' of the feature, to insure that the
>> type is part of the Sequence Ontology. It also makes it much easier to
>> properly write out the attributes in the ninth column, particularly the
>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>> BTG is still usable, but the GFF3 it puts out is actually more
>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>> constraints on the type and the terms that are used in the ninth column,
>> you have to be very careful using it to produce GFF3, by making sure
>> that your feature objects conform to the standard before BTG tries to
>> write them out. (Of course, one way to do that would be to convert your
>> feature objects to BSF::Annotated objects, but then you could use
>> BFIO::gff :-)
>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>> your sample data. Conveniently, since 'nucleotide_motif' is a SO term,
>> this is completely valid. (I even fixed the escaping the of the stray
>> '=' in 'hind_R=2046'.) The output I get is this:
>> ##gff-version 3
>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556
>> 918 - . Target=Contig151 325 832
>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736
>> 488 - . Target=Contig386 1 124
>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105
>> 1718 + . Target=Contig358 1 311
>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036
>> 312 - . Target=hind_R%3D2046 59 120
>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>> There is stuff in bioperl for reading and writing GFF3. There's
>>> Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which
>>> is the 'best' one to use?
>>> Neither of these is working very well for me.
>>> My proximate use case is reading in a RepeatMasker report with
>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>> writing those out to a GFF3 file.
>>> Bio::Tools::GFF will take these things and write out something that
>>> closely resembles GFF3, but with Target attributes that don't seem to
>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>> commas instead of spaces. I'm attaching a little script that
>>> illustrates this.
>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>> objects are writeable'. This seems a bit silly, since one of the whole
>>> points of Bioperl is using polymorphism to make it easy to connect
>>> things together. I've attached a little script to illustrate this one
>>> So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF
>>> deprecated? Why does Bio::FeatureIO::gff only accept
>>> Bio::SeqFeature::Annotated objects?
>>> Thanks in advance.
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>> Scott Cain, Ph. D. cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY 14853
rmb32 at cornell.edu
More information about the Bioperl-l