[Bioperl-l] Bio::Tools::Glimmer

Jason Stajich jason at bioperl.org
Tue Feb 6 19:33:11 EST 2007


I definitely vote for 1) - worst case you have 4 separate methods if  
there is no good way to condense the parsing for each format and  
require the user to specify the format.

I have no problem with requiring user to specify what program she  
used - if we can be fancy and guess the format later (i.e. guess  
format in SeqIO) -then that's icing.

-jason
On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote:

> Okay, I need to get something going for a project I'm working on.   
> Options:
>
> 1) Stick it all in one module:  This can get a bit ugly, as  
> Glimmer, as
> opposed to GlimmerM and GlimmerHMM, does not explicitly identify  
> itself in
> the prediction report.  You can pick up on some unique things in  
> the output
> file, but you don't know what you've got until you're actually  
> parsing it.
> Unless you require a format argument up front, then you can split the
> parsing code up into different functions.
> 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ 
> Glimmer3.
> With or without an abstract dispatch front end.
>
> I suppose at this point, after getting my hands dirty, I'd prefer  
> 1), with
> an explicit -format => Glimmer2/3/M/HMM arg required in the  
> constructor.
> Though I'm not opposed to 2) if that is what it takes to get it into
> Bioperl.
>
> If we can achieve some sort of consensus without too much  
> bloodshed, I'll
> shoot y'all some patches and we can consider this issue checked off  
> the
> list.
>
> On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>>
>>     I think it's going to be at least two modules, one for the
>> prokaryotic stuff and one for the eukaryotic.  And really, the
>> prokaryotic stuff is different enough to warrant two modules. So  
>> three
>> different parsers.  Could do it in one, but it would be ugly and
>> nasty.  However, this does not preclude three parsers and one  
>> abstract
>> interface, which is your excellent suggestion.
>>     Oh, and excuse me, but I have a bit of a rant here, after dealing
>> with parsers and pipelines for the last few months.  Parsers should
>> not load the whole input file into RAM to parse it.  And Pipelines
>> using the parsers (Ensembl / biopipe) should not stuff the whole
>> result set from the parser into a single array.  When you're  
>> trying to
>> annotate assemblies, it sucks to have to split up contigs/ 
>> supercontigs
>> because the whole result set won't fit into RAM on a 12 gig blade.
>> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
>> tiny (by comparison to vertebrates).  There, sorry, been saving up
>> that frustration for a while.  No offense meant, hope I didn't tick
>> anybody off.  8)
>>     Torsten:  You sound like you know what you're doing with respect
>> to Bioperl more than I do, and I know I don't have CVS access, so  
>> I'll
>> defer to you.  I'd be happy to help out, though.
>>
>>
>> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>
>>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
>>>
>>>> I'm not sure whether to
>>>>
>>>> 1. parse them all under the same module, perhaps with a
>>>> -format=>'glimmerXXX' parameter
>>>>
>>>> 2. create a single new module  Glimmer2 and Glimmer3
>>>>
>>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3,
>>>> given
>>>> they are different outputs both in syntax and number of output  
>>>> files
>>>>
>>>> Any advice from Bioperl 'old timers' appreciated ;-)
>>>>
>>>
>>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
>>> example for how this can work.
>>>
>>> If this would amount to basically 4 modules stringed together into
>>> one file (because the parsing code can't share much if anything
>>> between the flavors), it'd still be advantageous to have a single
>>> frontend module that would then dispatch.
>>>
>>>         -hilmar
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/



More information about the Bioperl-l mailing list