[Bioperl-l] Bio::Tools::Glimmer

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Tue Sep 19 21:13:43 EDT 2006


I have added example output files for all 4 flavours of Glimmer to 
bioperl-live CVS as t/data/Glimmer*, described below:

>     The initial checkin comments (circa '03) for Bio::Tools::Glimmer
> describe it as a 'GlimmerM 3.0' parser.  The POD says '...a module for
> parsing Glimmer predictions (currently GlimmerM
> 3.0 is all that has been tested)...'.  However, the latest version of
> GlimmerM looks to be 2.5.1 (ftp://ftp.tigr.org/pub/software/GlimmerM),
> and there are multiple versions/flavors of Glimmer besides GlimmerM:
> Glimmer 2.X ( bacteria, archaea, and viruses):
>     http://www.cbcb.umd.edu/software/glimmer/glimmer2.jun01.shtml

A single two part output file. The first part has detailed information 
regarding all ORFs, while the second part has the putative genes.


> Glimmer 3.X ( bacteria, archaea, and viruses):
>     http://www.cbcb.umd.edu/software/glimmer/

Glimmer3 produces two separate files: XXX.detail and XXX.predict.
The Glimmer3 .detail file is similar to the first part of the Glimmer 
2.x first part. The Glimmer3 .predict file conveys the same information 
as the second part of a Glimmer2 file, but in a totally different format!


 > GlimmerM ( eukaryotes ):
 >     http://www.cbcb.umd.edu/software/glimmerm/index.shtml
 >     http://www.tigr.org/software/glimmerm/

I used GlimmerM 2.5.1. The output matches the original 
"t/data/glimmer.out" test file in CVS.


> GlimmerHMM ( eukaryotes ):
>     http://www.cbcb.umd.edu/software/GlimmerHMM/

This format is nearly identical to GlimmerM, only the first line header 
is different. I used version 2.2.0.


>     I suspect Bio::Tools::Glimmer only parses GlimmerM, *maybe*
> GlimmerHMM, but not Glimmer 2.X or Glimmer 3.X.

It doesn't currently work with my GlimmerHMM output, as the module 
expects a version number, which my output does not have - but I will fix 
that in CVS today.

However it won't work with Glimmer 2.x and 3.x. And it probably 
shouldn't as the Eukaryotic stuff isn't relevant. New code has to be 
written. Most people only want the final gene predictions, which 2.x and 
3.x use different formats and files for.

I'm not sure whether to

1. parse them all under the same module, perhaps with a 
-format=>'glimmerXXX' parameter

2. create a single new module  Glimmer2 and Glimmer3

3. create two new modules, one for Glimmer2 and one for Glimmer3, given 
they are different outputs both in syntax and number of output files

Any advice from Bioperl 'old timers' appreciated ;-)

Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

More information about the Bioperl-l mailing list