[Bioperl-l] new GFF2 parsing/dumping routines committed...

Mark Wilkinson mwilkinson@gene.pbi.nrc.ca
Mon, 20 Nov 2000 17:03:01 -0600

Hi Group!

the last two commits from me allow the import/export of (as far as I understand) properly formatted GFF2.

to create generic features from GFF2 you would make the call as follows:

$Feature = new SeqFeature::Generic (-gff2_string => $string);

The most important differences between the original -gff_string and the new -gff2_string options are as follows

(1) the fields *must be TAB-separated* (formerly it was splitting on whitespace, but that would choke on the freetext that is now allowed)
(2) there is no default "group" tag created.  You must specify   group=MyGroup   in the attributes field
(3) tag/value units are semicolon separated
(4) tags can have more than one space-separated value
(5) free-text is allowed as a value so long as it is double-quoted.
(6) comments are allowed but are ignored (comments are at the end of the GFF line preceeded by a # symbol)

and example of a GFF string that could be parsed by this routine would be:

mysequence    GMHMM    exon    100    200    45    .    .    group=MyFavGene;notes="the answer"   "to LtUandE is"   42   # these are comments

this results in a feature with the following structure:

0  Bio::SeqFeature::Generic=HASH(0x844db70)
   '_gsf_end' => 200
   '_gsf_score' => 45
   '_gsf_seqname' => 'abc'
   '_gsf_start' => 100
   '_gsf_strand' => 0
   '_gsf_sub_array' => ARRAY(0x84507e8)
        empty array
   '_gsf_tag_hash' => HASH(0x845074c)
      'group' => ARRAY(0x845116c)
         0  'MyFavGene'
      'notes' => ARRAY(0x845122c)
         0  'the answer'
         1  'to LtUandE is'
         2  '42'
   '_parse_h' => HASH(0x8437dfc)
        empty hash
   '_primary_tag' => 'exon'
   '_record_err' => undef
   '_source_tag' => 'GMHMM'
   '_strict' => undef
   '_verbose' => undef

If you are so inclined please give this a thorough working over and let me know if you find errors.  So far it seems to be okay... touch wood!

Cheers all!


Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK