[Bioperl-l] bioperl reorganization
cjfields at illinois.edu
Fri Jul 17 23:14:49 EDT 2009
On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote:
> Will try to weigh in more, a little bit of stream of consciousness
> to let you know I'm thinking about it. Tough summer to focus much
> on this.
Yes, for me as well. That will change soon (approx two weeks) ;>
> It's too bad we are apparently the laughing stock of Perl gurus, but
> it would be great to see how to modernize aspects of the development.
> I'm curious how it will work that we'll have dozens of separate
> distros that we'll have a hard time keeping track of what directory
> things are in? Will there have to be a master list of what version
> and what modules are in what distro now?
I don't think we're a laughingstock as much as we haven't had the time
to dedicate towards this (and much of this occurred at a point early
on, with that whole 'Cathedral and Bazaar' esr-based thingy). BTW,,
those same gurus shouldn't speak: perl core is just as bad and riddled
with worse bugs, though rgs and co. wouldn't admit it.
In fact, base.pm itself has a nasty one; I'm surprised no one in the
bioperl community has noticed it yet (it's listed as a bug on RT I
pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print
pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print
-1, set by base.pm
Imported modules do not have VERSION set correctly when it is
exported. This hasn't become an issue in bioperl yet (it's really an
edge case), but several devs have run into this. And really, why set
VERSION to a string like '-1, set by base.pm'?
Anyway, re: versioning, the way I think about it, if we have a small
very stable core with version X, and a focused very stable module
group with version Y, other distributions would have a separate
version and require subgroup version Y (which would in turn require
core version X). CPAN would take care of it. This isn't much
different than what occurs everyday on CPAN anyway (Jay's Catalyst,
Moose and MooseX, and so on). In fact, several Moose-requiring
distributions don't require the latest Moose.
> When I do a SVN (or git) checkout do I need to checkout each of
> these in its own directory? Or will there be a master packaging
> script that makes the necessary zip files for CPAN submission?
Not sure; that would be up to us I suppose. I think it would be
easier to maintain and release if they were separate or packaged up as
> If they are in separate directories are we organizing by conceptual
> topic (phylogenetics, alignment, database search) or by namespace of
> the modules?
By topic, retaining namespaces. We have a basic Bio::* directory
structure already in place for various generic terms (Tools, DB, etc),
so I see this crossing simple namespaces very easily. And as I
pointed out to Robert, several of those could possibly go together.
> Do all the 'database' modules live together - probably not - so do
> we name bioperl-db-remote bioperl-db-local-index, bioperl-db-local-
> sql, etc? really bioperl-db is somewhat focused on sequences and
> features, but what about things that integrate multiple data types -
> like biosql?
I don't see bioperl-db (BioSQL) being split up. I think it's too
intrinsically linked and cohesive (it's almost a separate core unto
itself), so it would be counterproductive to do so.
Maybe have bioperl-db become bioperl-biosql. Web-based = bioperl-
remotedb. Local = bioperl-localdb. OBDA = bioperl-obda.
> If they are in separate directories, what about all the test data
> that might be shared, is this replicated among all the sub-
> directories - how do we do a good job keeping that up to date, could
> we have a test-data distro instead with symlinks within SVN?
We have to see how much is actually shared and proceed from there. I
would like to eventually resurrect the idea of a separate biodata repo
that we could just ftp the data from as needed. That would cut down
on the package size quite a bit, but I'm not sure how feasible that is
from the testing point of view (would we have to skip all tests if
there were no network access)?
> For some other obvious modules that can be split off and self-
> contained, each of these could be a package. I would estimate more
> than 20 packages depending on how Bio::Tools are carved up.
> - I think Bio::DB::SeqFeature needs to be split off for sure this is
> a nice logical peeling off. Could be another test case since it is
> a Gbrowse dependancy
> - Bio::DB::GFF as well for the same reasons.
Completely agree (and I think Lincoln would like this as well).
> - Bio::PopGen - self contained for the most part, but depends on
> Bio::Tree and Bio::Align objects
Could list those as a required dependency.
> - Bio::Variation
> - Bio::Map and Bio::MapIO
> - Bio::Cluster and Bio::ClusterIO
> - Bio::Assembly
> - Bio::Coordinate
> My nightmare is that we're going to have to manage a lot of 'use XX
> 1.01' enforcing version requiring when dealing with the dependancies
> on the interface classes and having to keep these all up to date?
> The version was implicit when they are all part of the same big
Right. But it also becomes a maintenance problem when serious bugs in
one module impede the needed release of others to CPAN.
> Also the splits need not only include one namespace if need be I
> guess but we have generally grouped things by namespace.
> What do you want to do about the bioperl-run. Do we make a set of
> parallel splits from all of these? I think at the outset we need to
> coordinate the applications supported here in some sort of loose
> ontology - the namespaces were not consistently applied so we have
> some alignment tools in different directories, etc. So the
> namespace sort of classifies them but it could be better. One of
> the challenges of multiple developers without a totally shared
> vision on how it should be done.
We could split bp-run and Tools, pairing the wrappers with the
relevant parsers modules. Not sure if this can be done with SearchIO
as well but it could be tested to see how feasible that would be.
> I'm not convinced that the Bio::Graphics splitoff has been painless
> so we should take stock of how that is working.
Really? Lincoln has made several fixes lately on CPAN, so I thought
everything was going well. If anything I would think the lack of
additional 1.6.x bioperl releases has probably held Gbrowse 2.0 up
more due to Bio::DB::SeqFeature (my fault, but as you know life and
job take precedence sometimes).
> It seems like this split off would be a way to better streamline
> things in bioperl so that modern versions of bioperl might be able
> to better interface with things like Ensembl again too.
> How much of this effort is worth triaging on the current code versus
> the efforts we want to make on a cleaner, simpler bioperl system
> that appears to scare so many users (and potential developers) off.
I say triage away on a branch, but we need to indicate which ones to
whittle out first. The reason I believe we went for a larger split
initially (as indicated on the wiki page) was to push something
forward and not get too bogged down in the details. But we may as
well go full throttle and do this right away.
> Okay I rambled, hope that was helpful.
> Jason Stajich
> jason at bioperl.org
Very, very helpful. Now I need a beer.
More information about the Bioperl-l