[Bioperl-l] Splits again
cjfields at uiuc.edu
Thu Jun 28 00:18:03 EDT 2007
On Jun 27, 2007, at 5:43 PM, Sendu Bala wrote:
> Chris Fields wrote:
>> If a fix needed to be made in one set, make the fix, test against
>> bioperl 'base' as a whole, and release when possible. No need to
>> wait for a full-fledged 1.5.3 release.
> What advantage is there of these defined splits instead of
> individual modules? As I see it you lose some of the potential
> benefits of breaking Bioperl up completely, whilst also suffering
> the maintenance problems I outlined in my objection to Steve's post.
> Being able to work on all Bioperl from a single cvs (ne svn) check
> out/ archive, whilst distributing it as individual modules on CPAN
> seems like the best of both worlds to me. What am I missing?
Okay, forewarned, but here's my long-winded reasoning. The short and
sweet version: I (very) respectfully don't agree with you, at least
re: the idea we should commit all modules to CPAN independently. It
doesn't make any sense to me, but maybe you can elaborate more?
Maybe I'm misinterpreting what you mean?
Also, I agree with Steve C. that core is anything but a
representation of a 'core' set of modules, and some sections could
(should?) be split off into discrete, cohesive units. We may be
alone in that camp, though it doesn't seem so (it's popped up more
than a few times, in one form or another). If you want an in-depth
explanation for both opinions, read on (below my sig), or feel free
to bypass it. I'll understand.
Finally, all of this should wait until later. Much later, like after
a decent release, after svn, etc kind of 'later'. I think we can
agree on that.
Still here? Okay... each issue (skip as needed):
Individual CPAN modules:
CPAN is not our personal versioning system; it may be if a
distribution consists of only a few modules, but not when it's one of
the largest distros present. If someone wants to update an
individual bioperl module for a quick bug fix they are more than
welcome to download it via cvs, svn, or even using a web browser, and
replace the one they have. In most cases, it works w/o problems.
With Module::Build you have even made it easier if a full
installation is necessary.
I'm trying to reason how one could break up the individual SeqIO/
SearchIO/otherIO modules into single module distributions. They are
intrinsically tied together (SeqIO::genbank won't work w/o SeqIO,
which relies on the various interfaces, RootIO, and on down). How
would tests be run off CPAN when the modules are distributed
independently? Would they also be individually distributed? What
would you use to tie all the individual modules together? How would
you explain to the CPAN maintainers that you want to split bioperl
into 990 individual modules, all updated independently, but intend on
bundling them afterwards anyway?
I'm failing to see the advantages to this approach, but if you can
find an example where this was done successfully on CPAN or elsewhere
maybe I could see what you mean.
Splitting up core:
As I see it, here are the advantages of a defined split as Steve and
I see it (off the top of my head). Some of this probably reiterates
my previous points, as well as Steve's, so apologies in advance.
- A lean, mean, focused set of bioperl base modules (core) w/o or
with very few external deps, minimal installation issues, etc. The
very basic stuff to get up and running.
- BioPerl bundled modules (Nathan's 'cliques') with defined, focused
functionality, code, and tests, which add a bit more 'sugar' to the
base functionality of the core. If you only care about parsing BLAST
reports, get SearchIO, which requires core and optionally other
modules (XML::SAX). If you want additional DB functionality apart
from the very basic ones in core, install DB (with it's additional
requirements, including core, DBI, and so on). Same with Graphics,
Tools, Tree/Phylo, etc. We just need to define and limit the number
- Easier to add additional bundled modules. For instance, I could
focus all of my RNA work into a discrete set of modules (say, bioperl-
rna) which I maintain, I ensure works with the latest core code, I
ensure also plays well with the other children =) , and I distribute
via CPAN. Same with EUtilities, which could go into a separated DB-
related set or stay in core.
- If we want a full-fledged 'install everything', the CPAN Bundle
system is available. I think it's easier to use a Bundle for 4-5,
even 10 groups of modules as opposed to over 900.
- A Bundle or a build file where discrete distributions are listed
(Bio::SearchIO, etc) wouldn't need to be updated every time a new
module is added to a distribution. I suppose this could be
automated, but why have the additional headache?
- A chance to cut out some cruft. We all know that particular areas
need work or a complete overhaul (Restriction, Structure, maybe a few
others). Smaller, concentrated sets of modules I believe would be
easier to maintain, and those that don't get use will eventually fall
out of favor and may be lost or replaced from the more maintained
group of modules. Survival of the fittest.
- We already have had practice; bioperl-db, bioperl-run, bioperl-
network, and others. Those that have been routinely maintained and
enjoy wide use (db, run, network) have survived; others not so much
(corba-related stuff, microarray, ext, etc., though the code is still
available if someone else wants to take it up and revive it!).
Disadvantages of a defined split:
- The initial headache of identifying which groups go where,
coordinating with those who rely on bioperl (GMOD, etc) on how this
will be set up, so on...
- Separate groups of modules require testing together to ensure
functionality is consistent and maintained (something I think you
pointed out previously).
- I think an increased possibility of branching is possible.
- Extra headaches for devs, who have to keep track of the various
critical distributions and make sure they work well together.
- Maybe others, but it's getting late here. Add more as needed; I'm
sure there are a number more.
More information about the Bioperl-l