[Bioperl-l] Bioperl partitioning (was Re: SVN and ...Re: Perltidy)

Steve Chervitz sac at bioperl.org
Tue Jun 19 14:54:39 EDT 2007

Valid points, Sendu. I wonder if there might be a best-of-both-worlds
approach here. I would not be advocating for a major slice and dice,
but just identifying a few large, reasonably well established and
encapsulated blocks of functionality that could be managed more
independently and segregating them away from the rest. For example:
DB, Graphics, Search+SearchIO, Tools.

Once per year, we could have a "whole caboodle" release where the core
and all sub parts are tested and released as a group, as we currently
do. Then, updates to the sub parts can occur as-needed but without
necessarily involving updates to other sub parts or the core.

The onus would be on the pumpkin for the sub part release to make sure
it continues to work with the last whole caboodle release. This would
minimize the number of release clashes, since sub part updates would
only be sanctioned relative to the last caboodle release, and it would
ensure that the whole set continues to interoperate.

Perhaps it would be worth experimenting with such an approach so we
can judge it based on actual experience. We could identify one
functional sub part and segregate it out, do a release cycle or two,
along with a sub part release, and decide if this makes things easier
or harder, for devs as well as users. We could always bring it back
into the fold if it doesn't work out.

My fear is that as bioperl continues to grow, the monolithic approach
will become increasingly onerous for a single release pumpkin to
manage, and harder to find someone who feels up to the task. It could
also discourage new developers from diving into the codebase if it
looks too deep. And they are our lifeblood.

A more functionally segregated bioperl codebase could lower the
activation energy needed to recruit release pumpkins and new devs,
leading to more release iterations, fewer bugs, more features, and
more sustainable growth.

When I first discovered Bioperl in 1996, it had three modules. At
~900, I  probably wouldn't have joined ranks as a developer (well, I
probably would, but it would have taken a while to digest it and
become a contributor).


On 6/19/07, Sendu Bala <bix at sendu.me.uk> wrote:
> Steve Chervitz wrote:
> > Might this been a good opportunity to investigate partitioning
> > bioperl-live into sub-repositories? There has been talk in the past of
> > defining a set of "core" modules separate from other functionally
> > related groups of modules that would be viewed as optional extensions.
> > The goal being to help manage growth and simplify releases. There are
> > currently 892 modules under Bio/.
> >
> > In addition to simplifying the migration to SVN, it would also have
> > other benefits. Say some new functionality or a slew of fixes were
> > added to Bio::Graphics. We could turn around a new Bio::Graphics
> > release quickly without having to work on getting various other parts
> > up to snuff that aren't related to graphics (Biblio, DB, PopGen,
> > Search etc.). Maintenance and releases of the various extensions would
> > be more parallelizable, orchestrated by separate ring leaders.
> >
> > Over time, as a set of functionality matures, it would see fewer
> > updates and there would be less of a need for users to
> > download/install/test it. This could make bioperl easier to customize,
> > extend, and grok in general.
> >
> > Long term, it should ease development and release cycles
> I actually take the opposite view. Breaking things up makes testing and
> releases more difficult.
> If one person acts as pumpkin for all the sub-parts, his work-load
> increases almost linearly with the number of sub-parts. If each sub-part
> gets its own pumpkin, where do all these pumpkins come from? It seems to
> me that frequently authors will write modules but inevitably their
> circumstance changes and they can no longer devote the time to look
> after them. Having a single pumpkin and 'forcing' him to make sure
> everything works (regardless of his personal interest in the module)
> seems more reliable than hoping there will be a person interested enough
> in each sub-part to handle its release.
> Since all sub-parts will at the least interact with the 'true' core set
> of Bioperl modules, they need to be tested and potentially re-released
> every time the true core is updated. And since some sub-parts will
> interact with other sub-parts, there will need to be coordinated
> joint-testing and release of multiple sub-parts.
> What happens when users report problems? We ask them what version
> they're running. Right now '1.5.2' means a specific thing, and its
> trivial for someone to confirm the same problem by installing 1.5.2.
> What happens when users have to list out all the versions of all the
> sub-parts they have? Who is going to consistently recreate a users
> hodge-podge of versions in order to confirm a bug? Won't the advice
> instead be: "update all versions to the latest and get back to us"?
> So, as I see it, all sub-parts would best be tested and released with a
> single new version number every time one sub-part is updated
> (significantly). In which case, why have sub-parts at all? Keeping
> things the way they are now means ease of release for the pumpkin and
> ease of installation for end-users (only one install command to issue to
> CPAN). Having 'true' sub-parts (each with its own pumpkin), in my
> fatalistic view, is just going to lead to some useful sub-parts being
> abandoned and never updated, even where updates may be desirable.
> Each and every Bio:: module could have been released separately by its
> respective author. As I see it, one of the main values of 'Bioperl' is
> that its one (reasonably) consistent collection of modules that lowers
> the barrier of entry for new Bioinformaticians, giving them extremely
> easy access to a whole host of functionality with a single install.

More information about the Bioperl-l mailing list