[Bioperl-l] new GFF2 parsing/dumping routines committed...
Mon, 20 Nov 2000 17:03:01 -0600
the last two commits from me allow the import/export of (as far as I understand) properly formatted GFF2.
to create generic features from GFF2 you would make the call as follows:
$Feature = new SeqFeature::Generic (-gff2_string => $string);
The most important differences between the original -gff_string and the new -gff2_string options are as follows
(1) the fields *must be TAB-separated* (formerly it was splitting on whitespace, but that would choke on the freetext that is now allowed)
(2) there is no default "group" tag created. You must specify group=MyGroup in the attributes field
(3) tag/value units are semicolon separated
(4) tags can have more than one space-separated value
(5) free-text is allowed as a value so long as it is double-quoted.
(6) comments are allowed but are ignored (comments are at the end of the GFF line preceeded by a # symbol)
and example of a GFF string that could be parsed by this routine would be:
mysequence GMHMM exon 100 200 45 . . group=MyFavGene;notes="the answer" "to LtUandE is" 42 # these are comments
this results in a feature with the following structure:
'_gsf_end' => 200
'_gsf_score' => 45
'_gsf_seqname' => 'abc'
'_gsf_start' => 100
'_gsf_strand' => 0
'_gsf_sub_array' => ARRAY(0x84507e8)
'_gsf_tag_hash' => HASH(0x845074c)
'group' => ARRAY(0x845116c)
'notes' => ARRAY(0x845122c)
0 'the answer'
1 'to LtUandE is'
'_parse_h' => HASH(0x8437dfc)
'_primary_tag' => 'exon'
'_record_err' => undef
'_source_tag' => 'GMHMM'
'_strict' => undef
'_verbose' => undef
If you are so inclined please give this a thorough working over and let me know if you find errors. So far it seems to be okay... touch wood!
Dr. Mark Wilkinson
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place