[Bioperl-l] bioperl reorganization
jay at jays.net
Fri Jul 17 15:55:38 EDT 2009
Jason Stajich wrote:
> I'm curious how it will work that we'll have dozens of separate
> distros that we'll have a hard time keeping track of what directory
> things are in? Will there have to be a master list of what version and
> what modules are in what distro now?
> When I do a SVN (or git) checkout do I need to checkout each of these
> in its own directory? Or will there be a master packaging script that
> makes the necessary zip files for CPAN submission?
Perhaps my Catalyst experience would be a useful additional to this
discussion. Catalyst is a popular web framework composed of dozens of
Users install Catalyst (cpan Catalyst), which is everything a user needs
to build a basic website. The list of classes the user just installed is
Which lives in SVN here:
As each user finds additional shiny things relevant to them on CPAN
(Catalyst::* e.g. Catalyst::Plugin::FillInForm), they install those,
individually (cpan Catalyst::Plugin::FillInForm).
All Catalyst::* distributions live in the same SVN repository, as
entirely independent, ready-to-ship CPAN distributions:
So, as a new or veteran developer, when I find a bug in
Catalyst::Plugin::FillInForm I patch it in SVN
and then, like any other CPAN distribution, I prep and push that
distribution to PAUSE.
-make my code changes-
-vi lib/Catalyst/Plugin/FillInForm.pm, increment VERSION-
ftp Catalyst-Plugin-FillInForm-0.11.tar.gz to pause.cpan.org:/incoming
That's it. I just upgraded Catalyst::Plugin::FillInForm from 0.10 to 0.11.
There is no "master list of what version and what modules are in what
distro now". CPAN itself is that resource.
Bottom line, small parts of Catalyst are pushed out to CPAN *every day*.
Very cool. Shocking when compared to the BioPerl release history on CPAN.
(Catalyst::Plugin::FillInForm happens to use Module::Install. But
another author may prefer ExtUtils::MakeMaker, or Dist::Zilla, or
Module::Build, or whatever. Each* Catalyst:: is an independent
distribution that is free to shift slowly, or quickly, over time as
developer interest dictates.)
(* Each meaning "tiny, highly inter-relevant groups of classes.")
Large, seismic shifts in Catalyst itself (Catalyst-Runtime) are a new
branch in SVN, that can take a few months. Like this year's total
reworking of Catalyst to use Moose internally (the move from the 5.70
branch to the 5.80 branch).
But "total reworkings" of Catalyst can and do continue to happen because
the "Catalyst" distribution (Catalyst-Runtime) is independent from the
dozens of other great Catalyst:: packages available on CPAN.
So Catalyst:: is a loose federation of cooperative modules on CPAN tied
together by namespace and the API of Catalyst-Runtime.
> If they are in separate directories are we organizing by conceptual
> topic (phylogenetics, alignment, database search) or by namespace of
> the modules? Do all the 'database' modules live together - probably
> not - so do we name bioperl-db-remote bioperl-db-local-index,
> bioperl-db-local-sql, etc? really bioperl-db is somewhat focused on
> sequences and features, but what about things that integrate multiple
> data types - like biosql?
In the Catalyst development model CPAN namespace (package name), the SVN
path, and distribution name are all the same. (Hopefully namespaces
somewhat match conceptual topics. -grin-)
> If they are in separate directories, what about all the test data that
> might be shared, is this replicated among all the sub-directories -
> how do we do a good job keeping that up to date, could we have a
> test-data distro instead with symlinks within SVN?
I don't believe Catalyst packages ever share test data. Is there lots of
re-use of large amounts of test data by what should be separate
distributions in BioPerl? I'm not familiar with SVN symlinks. I don't
think Catalyst SVN has any.
14:33 <@t0m> jhannah: you mean svn:externals, and yes, it's used by a
load of the engines to steal the TestApp from -Runtime
14:34 <@t0m> I'd be more tempted to make the test data it's own dist
if that's sane.
> My nightmare is that we're going to have to manage a lot of 'use XX
> 1.01' enforcing version requiring when dealing with the dependancies
> on the interface classes and having to keep these all up to date? The
> version was implicit when they are all part of the same big distro.
Catalyst::Plugin::FillInForm has this in its Makefile.PL:
requires 'Catalyst' => '5.7012';
CPAN then enforces and auto-installs dependencies for the users. Like
the rest of CPAN, Catalyst lets CPAN enforce dependencies.
Doesn't that render most 'use XX 1.01' statements obsolete?
> Also the splits need not only include one namespace if need be I guess
> but we have generally grouped things by namespace.
I believe all Catalyst distibutions are *very* cleanly split on
namespace. I imagine not doing so would be a nightmare.
> I'm not convinced that the Bio::Graphics splitoff has been painless so
> we should take stock of how that is working.
I'd like to hear about any pain so I could compare to Catalyst...
> How much of this effort is worth triaging on the current code versus
> the efforts we want to make on a cleaner, simpler bioperl system that
> appears to scare so many users (and potential developers) off.
One of the amazing things that happen in Catalyst and Moose frequently
is that random people wander into irc.perl.org #catalyst or #moose and
say "this is broke". If they seem to be clued then they get an SVN (or
git) commit bit (on the specific directory of that distribution) after
submitting a single patch, and then become CPAN co-maintainers of that
package after the second patch. Soon they're improving that part of CPAN
on their own.
The risk to the community is mitigated by the fact that even if jhannah
breaks Catalyst::Plugin::FillInForm 0.11, most Catalyst users don't use
that specific Plugin anyway. Also, CPAN has copies of the 4 previous
versions of C::P::F sitting around all over the world for people to fall
This leaves the hard-core Catalyst developers free to improve the
central engine, rather than forcing them to focus on POD patches to the
500 peripheral bits all the time.
Catalyst and Moose are amazingly delegated. It's hard not to end up with
commit bits and CPAN co-maint of their small distributions when you
express good ideas.
The small/rare bits get fixed by the few people that care. The wizards
focus on the big picture changes.
I know all this is already happening in BioPerl SVN. It'd just be great
if it happened on CPAN too.
Since I've been using bioperl-live directly for years I really haven't
cared about CPAN release schedules. But if you're not a BioPerl
developer, you probably pull CPAN only.
> Okay I rambled, hope that was helpful.
Ditto!! Only worse rambling, and probably less helpful. :)
More information about the Bioperl-l