jason at bioperl.org
Tue Feb 6 19:33:11 EST 2007
I definitely vote for 1) - worst case you have 4 separate methods if
there is no good way to condense the parsing for each format and
require the user to specify the format.
I have no problem with requiring user to specify what program she
used - if we can be fancy and guess the format later (i.e. guess
format in SeqIO) -then that's icing.
On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote:
> Okay, I need to get something going for a project I'm working on.
> 1) Stick it all in one module: This can get a bit ugly, as
> Glimmer, as
> opposed to GlimmerM and GlimmerHMM, does not explicitly identify
> itself in
> the prediction report. You can pick up on some unique things in
> the output
> file, but you don't know what you've got until you're actually
> parsing it.
> Unless you require a format argument up front, then you can split the
> parsing code up into different functions.
> 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/
> With or without an abstract dispatch front end.
> I suppose at this point, after getting my hands dirty, I'd prefer
> 1), with
> an explicit -format => Glimmer2/3/M/HMM arg required in the
> Though I'm not opposed to 2) if that is what it takes to get it into
> If we can achieve some sort of consensus without too much
> bloodshed, I'll
> shoot y'all some patches and we can consider this issue checked off
> On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>> I think it's going to be at least two modules, one for the
>> prokaryotic stuff and one for the eukaryotic. And really, the
>> prokaryotic stuff is different enough to warrant two modules. So
>> different parsers. Could do it in one, but it would be ugly and
>> nasty. However, this does not preclude three parsers and one
>> interface, which is your excellent suggestion.
>> Oh, and excuse me, but I have a bit of a rant here, after dealing
>> with parsers and pipelines for the last few months. Parsers should
>> not load the whole input file into RAM to parse it. And Pipelines
>> using the parsers (Ensembl / biopipe) should not stuff the whole
>> result set from the parser into a single array. When you're
>> trying to
>> annotate assemblies, it sucks to have to split up contigs/
>> because the whole result set won't fit into RAM on a 12 gig blade.
>> Sheesh. Though this doesn't matter for bacterial genomes, as they're
>> tiny (by comparison to vertebrates). There, sorry, been saving up
>> that frustration for a while. No offense meant, hope I didn't tick
>> anybody off. 8)
>> Torsten: You sound like you know what you're doing with respect
>> to Bioperl more than I do, and I know I don't have CVS access, so
>> defer to you. I'd be happy to help out, though.
>> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
>>>> I'm not sure whether to
>>>> 1. parse them all under the same module, perhaps with a
>>>> -format=>'glimmerXXX' parameter
>>>> 2. create a single new module Glimmer2 and Glimmer3
>>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3,
>>>> they are different outputs both in syntax and number of output
>>>> Any advice from Bioperl 'old timers' appreciated ;-)
>>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
>>> example for how this can work.
>>> If this would amount to basically 4 modules stringed together into
>>> one file (because the parsing code can't share much if anything
>>> between the flavors), it'd still be advantageous to have a single
>>> frontend module that would then dispatch.
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Miller Research Fellow
University of California, Berkeley
More information about the Bioperl-l