[Bioperl-l] Splits again

Chris Fields cjfields at uiuc.edu
Thu Jun 28 00:18:03 EDT 2007

On Jun 27, 2007, at 5:43 PM, Sendu Bala wrote:

> Chris Fields wrote:
> ...
>> If a fix needed to be made in one set, make the fix, test against   
>> bioperl 'base' as a whole, and release when possible.  No need to   
>> wait for a full-fledged 1.5.3 release.
> What advantage is there of these defined splits instead of  
> individual modules? As I see it you lose some of the potential  
> benefits of breaking Bioperl up completely, whilst also suffering  
> the maintenance problems I outlined in my objection to Steve's post.
> Being able to work on all Bioperl from a single cvs (ne svn) check  
> out/ archive, whilst distributing it as individual modules on CPAN  
> seems like the best of both worlds to me. What am I missing?

Okay, forewarned, but here's my long-winded reasoning.  The short and  
sweet version: I (very) respectfully don't agree with you, at least  
re: the idea we should commit all modules to CPAN independently.  It  
doesn't make any sense to me, but maybe you can elaborate more?   
Maybe I'm misinterpreting what you mean?

Also, I agree with Steve C. that core is anything but a  
representation of a 'core' set of modules, and some sections could  
(should?) be split off into discrete, cohesive units.  We may be  
alone in that camp, though it doesn't seem so (it's popped up more  
than a few times, in one form or another).  If you want an in-depth  
explanation for both opinions, read on (below my sig), or feel free  
to bypass it.  I'll understand.

Finally, all of this should wait until later.  Much later, like after  
a decent release, after svn, etc kind of 'later'.  I think we can  
agree on that.


Still here?  Okay... each issue (skip as needed):

Individual CPAN modules:

CPAN is not our personal versioning system; it may be if a  
distribution consists of only a few modules, but not when it's one of  
the largest distros present.  If someone wants to update an  
individual bioperl module for a quick bug fix they are more than  
welcome to download it via cvs, svn, or even using a web browser, and  
replace the one they have.  In most cases, it works w/o problems.   
With Module::Build you have even made it easier if a full  
installation is necessary.

I'm trying to reason how one could break up the individual SeqIO/ 
SearchIO/otherIO modules into single module distributions.  They are  
intrinsically tied together (SeqIO::genbank won't work w/o SeqIO,  
which relies on the various interfaces, RootIO, and on down).  How  
would tests be run off CPAN when the modules are distributed  
independently?  Would they also be individually distributed?  What  
would you use to tie all the individual modules together?  How would  
you explain to the CPAN maintainers that you want to split bioperl  
into 990 individual modules, all updated independently, but intend on  
bundling them afterwards anyway?

I'm failing to see the advantages to this approach, but if you can  
find an example where this was done successfully on CPAN or elsewhere  
maybe I could see what you mean.

Splitting up core:

As I see it, here are the advantages of a defined split as Steve and  
I see it (off the top of my head).  Some of this probably reiterates  
my previous points, as well as Steve's, so apologies in advance.

- A lean, mean, focused set of bioperl base modules (core) w/o or  
with very few external deps, minimal installation issues, etc.  The  
very basic stuff to get up and running.

- BioPerl bundled modules (Nathan's 'cliques') with defined, focused  
functionality, code, and tests, which add a bit more 'sugar' to the  
base functionality of the core.  If you only care about parsing BLAST  
reports, get SearchIO, which requires core and optionally other  
modules (XML::SAX).  If you want additional DB functionality apart  
from the very basic ones in core, install DB (with it's additional  
requirements, including core, DBI, and so on).  Same with Graphics,  
Tools, Tree/Phylo, etc.  We just need to define and limit the number  
of splits.

- Easier to add additional bundled modules.  For instance, I could  
focus all of my RNA work into a discrete set of modules (say, bioperl- 
rna) which I maintain, I ensure works with the latest core code, I  
ensure also plays well with the other children =) , and I distribute  
via CPAN.  Same with EUtilities, which could go into a separated DB- 
related set or stay in core.

- If we want a full-fledged 'install everything', the CPAN Bundle  
system is available.  I think it's easier to use a Bundle for 4-5,  
even 10 groups of modules as opposed to over 900.

- A Bundle or a build file where discrete distributions are listed  
(Bio::SearchIO, etc) wouldn't need to be updated every time a new  
module is added to a distribution.  I suppose this could be  
automated, but why have the additional headache?

- A chance to cut out some cruft.  We all know that particular areas  
need work or a complete overhaul (Restriction, Structure, maybe a few  
others).  Smaller, concentrated sets of modules I believe would be  
easier to maintain, and those that don't get use will eventually fall  
out of favor and may be lost or replaced from the more maintained  
group of modules.  Survival of the fittest.

- We already have had practice; bioperl-db, bioperl-run, bioperl- 
network, and others.  Those that have been routinely maintained and  
enjoy wide use (db, run, network) have survived; others not so much  
(corba-related stuff, microarray, ext, etc., though the code is still  
available if someone else wants to take it up and revive it!).

Disadvantages of a defined split:

- The initial headache of identifying which groups go where,  
coordinating with those who rely on bioperl (GMOD, etc) on how this  
will be set up, so on...

- Separate groups of modules require testing together to ensure  
functionality is consistent and maintained (something I think you  
pointed out previously).

- I think an increased possibility of branching is possible.

- Extra headaches for devs, who have to keep track of the various  
critical distributions and make sure they work well together.

- Maybe others, but it's getting late here.  Add more as needed; I'm  
sure there are a number more.


More information about the Bioperl-l mailing list