[Bioperl-l] Splits again

Chris Fields cjfields at uiuc.edu
Thu Jun 28 01:17:01 EDT 2007

D'oh!  Just when I wanted to go to bed.  It's not fair, you're in  

On Jun 27, 2007, at 10:51 PM, Jason Stajich wrote:

> Hey guys - I'm wading in a bit late as I haven't had time to keep up
> with whole discussion.
> So you are suggesting 800+ individual CPAN modules?  I don't think
> that is a good idea.  Why would you split up Bio::Seq::RichSeq and
> Bio::Seq into two separate packages for example? I think if you
> really want to move away from the monolithic install it has to be
> more logical by function - but I am not that optimistic that this is
> going to actually be easier for people.  Maybe I'm misunderstanding.

Okay, so maybe it wasn't just me.

> What are the arguments for separating things -- to make it so people
> aren't scared by the number of modules so they'll code?  It seems
> like some people just want it to be installed and run scripts - does
> having them install dozens of modules work.  Do we need to consider
> people how much this would suck if someone can't use CPAN or
> Module::Builder to automate dependancy tracking installation?  How
> does it work when modules are deprecated?

What I envision for core is maybe not just one distribution, but a  
cluster of distributions:

base - Bio::Seq; Bio::SeqIO; Bio::AlignIO, some Bio::DB, associated  
modules.  Bare bones, with as few dependencies as possible.
aux - Any Bio::SeqIO, Bio::AlignIO, Bio::DB etc. that requires  
additional modules.
search - Bio::Search and SearchIO
tools - Bio::Tools, Bio::Restriction, maybe DB modules, GFF-related  
graphics - Bio::Graphics.  Maybe GMOD-related stuff here?

The last four would list bioperl-core as a dependency themselves  
along with any other modules necessary.  We could also have the core  
Build.PL ask the user if they want to install the other non-base  
distros, and maybe include bioperl-db, bioperl-network, and bioperl- 
run in the loop if requested.

All would be installed as a bundle similar to Bundle::BioPerl, but  
have regular CPAN point releases (1.x.x) independently from one  
another i.e. for bug fixes, with a yearly/biyearly timed full release  
(1.x) of the whole shebang.  Any point release for any 'core'  
distribution would have to be tested against the others prior to  

This is basically following Steve's train of thought, though more  


> I'm not sure I have made up my mind on what I'd like to see, but at
> some point I think we need to get a clearer idea of what audience we
> are trying to serve best.  If want it to be easy to install maybe we
> should invest time into making OSX double-click installers, RPMs, and
> the Windows stuff easily installable.  If we want to serve the
> developers who aren't using SVN so we want to push out releases of
> modules ASAP?  I just am not clear on the motivation for some of the
> proposed changes.

I think regular CPAN releases with updated PPMs hosted via portal  
work fine for the most part, but it would be nice to host RPMs.   
Others (Allen Day, for instance) have donated time to generate RPMs  
but they seem to lag behind a bit more.

The original idea for svn arose from an unrelated thread with Mark  
Johnson discussing something (Glimmer maybe?) and took off from  
there.  I was actually pretty surprised it took on a life of it's  
own.  As for the motivation to switch, I haven't specifically used it  
myself, but the large number of responses seem to indicate others  
have and seem happy with it.  Rutger Vos had also indicated he would  
move Bio::Phylo over to the repo if we used svn.  We def. should  
address the issues you bring up (why _WE_ need svn) more succinctly  
but that shouldn't be an issue.

> Also - the main point I wanted to make - Can I suggest we spend a
> little time discussing what it will take to get a stable release for
> the current code as it stands (bioperl-live and bioperl-run)?  It
> seems like we really need to do this first so that we have a stable
> release that can be followed by CVS -> SVN migration, then consider
> major changes to the repository structure and release packaging, and
> potential deprecation and incorporation of other modules.

Agreed.  We prob. need to schedule a good couple of days (or so) to  
squash bugs.

> I assume there is no chance that we'd have a 1.6 candidate by BOSC
> next month?

Um, not likely as nothing has been addressed Feature/Annotation-wise  
(overloads are still there, methods have not been deprecated, etc).   
There was an underlying assumption these would have an effect on GMOD- 
related stuff (I remember reading a post from Scott Cain in the mail  
archive mentioning something along these lines after the 1.5 release  

Maybe a quick 1.5.3 for BOSC, with a 1.6 for fall?

> Will it be productive to schedule a fair amount of time at BOSC
> discussing how to partition out the packages into separate sub-
> packages after we've done a successful release rather than trying to
> change things right now? I realize not everyone will be there but
> maybe it will be easier to interact on this then.

How many are going to be there?  I can't go this year except on my  
own dime (which I don't have many of, student loans and all, sorry),  
though I'll likely be in a new lab by spring which is likely more  
amenable to funding.  If there is a hackathon in the late fall (post- 
sept) I'll make it a point to go regardless.

> I think it will also be time to talk with Lincoln/Scott about how
> Gbrowse is structured and if that is working for them.  There is too
> much code in different places that I think we need to figure out how
> to structure it properly so those packages can be released.  It would
> probably mean moving Bio::Graphics, Bio::DB::GFF and
> Bio::DB::SeqFeature and gff tools for Gbrowse into separate packages
> so they could be released more regularly on par with Gbrowse
> schedules.   Also I think someone needs to figure out Bio::Tools::GFF
> vs Bio::FeatureIO -- what do we want to do?  I don't think we really
> fully support GFF3 that well -- the X2GFF scripts probably need some
> more good testing (where X is BLAST,FASTA,Sim4, GenBank, EMBL,
> etc... ) and or migration to the proper GFF writing.
> -jason

Will Lincoln or Scott be at BOSC?


More information about the Bioperl-l mailing list