[Bioperl-l] bioperl reorganization

Chris Fields cjfields at illinois.edu
Sat Jul 18 09:48:54 EDT 2009


I think keeping the two together is a good idea unless Bio::DB::GFF is  
essentially end-of-life and will no longer be maintained.  Then maybe  
it's a good idea to port all needed methods to Bio::DB::SeaFeature and  
release the code separately, then call it a day on Bio::DB::GFF  
maintenance-wise?  Just a thought.

Nice to hear my tardiness on 1.6.whatever has not held up Gbrowse2.   
Thanks!  Will be setting up my own local instance of Gbrowse2 here soon.


On Jul 18, 2009, at 7:23 AM, Scott Cain wrote:

> Hi All,
> I don't want to wade in too deeply, but I like the idea of splitting  
> things up.  I think the Bio::Graphics split has gone well and has  
> made life easier in GBrowse world.  I could see Bio::DB::SeqFeature  
> and Bio::DB::GFF being split and either being kept together or going  
> there separate ways (though I have a nagging suspicion that  
> SeqFeature code depends on GFF code in a few places, so it may make  
> sense to just keep them together.
> And Chris, if it makes you feel any better, I don't think anything  
> you've done or not done has held up GBrowse2.
> Scott
> On Jul 17, 2009, at 11:14 PM, Chris Fields wrote:
>> My 2c...
>> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote:
>>> Will try to weigh in more, a little bit of stream of consciousness  
>>> to let you know I'm thinking about it.  Tough summer to focus much  
>>> on this.
>> Yes, for me as well.  That will change soon (approx two weeks) ;>
>>> It's too bad we are apparently the laughing stock of Perl gurus,  
>>> but it would be great to see how to modernize aspects of the  
>>> development.
>>> I'm curious how it will work that we'll have dozens of separate  
>>> distros that we'll have a hard time keeping track of what  
>>> directory things are in? Will there have to be a master list of  
>>> what version and what modules are in what distro now?
>> I don't think we're a laughingstock as much as we haven't had the  
>> time to dedicate towards this (and much of this occurred at a point  
>> early on, with that whole 'Cathedral and Bazaar' esr-based  
>> thingy).  BTW,, those same gurus shouldn't speak: perl core is just  
>> as bad and riddled with worse bugs, though rgs and co. wouldn't  
>> admit it.
>> In fact, base.pm itself has a nasty one; I'm surprised no one in  
>> the bioperl community has noticed it yet (it's listed as a bug on  
>> RT I think):
>> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print  
>> $Bio::SeqIO::VERSION."\n"'
>> 1.0069
>> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print  
>> $Bio::Root::IO::VERSION."\n"'
>> -1, set by base.pm
>> Imported modules do not have VERSION set correctly when it is  
>> exported.  This hasn't become an issue in bioperl yet (it's really  
>> an edge case), but several devs have run into this. And really, why  
>> set VERSION to a string like '-1, set by base.pm'?
>> Anyway, re: versioning, the way I think about it, if we have a  
>> small very stable core with version X, and a focused very stable  
>> module group with version Y, other distributions would have a  
>> separate version and require subgroup version Y (which would in  
>> turn require core version X).  CPAN would take care of it.  This  
>> isn't much different than what occurs everyday on CPAN anyway  
>> (Jay's Catalyst, Moose and MooseX, and so on).  In fact, several  
>> Moose-requiring distributions don't require the latest Moose.
>>> When I do a SVN (or git) checkout do I need to checkout each of  
>>> these in its own directory?  Or will there be a master packaging  
>>> script that makes the necessary zip files for CPAN submission?
>> Not sure; that would be up to us I suppose.  I think it would be  
>> easier to maintain and release if they were separate or packaged up  
>> as Jay suggests.
>>> If they are in separate directories are we organizing by  
>>> conceptual topic (phylogenetics, alignment, database search) or by  
>>> namespace of the modules?
>> By topic, retaining namespaces.  We have a basic Bio::* directory  
>> structure already in place for various generic terms (Tools, DB,  
>> etc), so I see this crossing simple namespaces very easily.  And as  
>> I pointed out to Robert, several of those could possibly go together.
>>> Do all the 'database' modules live together - probably not  - so  
>>> do we name bioperl-db-remote bioperl-db-local-index, bioperl-db- 
>>> local-sql, etc?  really bioperl-db is somewhat focused on  
>>> sequences and features, but what about things that integrate  
>>> multiple data types - like biosql?
>> I don't see bioperl-db (BioSQL) being split up.  I think it's too  
>> intrinsically linked and cohesive (it's almost a separate core unto  
>> itself), so it would be counterproductive to do so.
>> Maybe have bioperl-db become bioperl-biosql.  Web-based = bioperl- 
>> remotedb.  Local = bioperl-localdb. OBDA = bioperl-obda.
>>> If they are in separate directories, what about all the test data  
>>> that might be shared, is this replicated among all the sub- 
>>> directories - how do we do a good job keeping that up to date,  
>>> could we have a test-data distro instead with symlinks within SVN?
>> We have to see how much is actually shared and proceed from there.   
>> I would like to eventually resurrect the idea of a separate biodata  
>> repo that we could just ftp the data from as needed.  That would  
>> cut down on the package size quite a bit, but I'm not sure how  
>> feasible that is from the testing point of view (would we have to  
>> skip all tests if there were no network access)?
>>> For some other obvious modules that can be split off and self- 
>>> contained, each of these could be a package.  I would estimate  
>>> more than 20 packages depending on how Bio::Tools are carved up.
>>> - I think Bio::DB::SeqFeature needs to be split off for sure this  
>>> is a nice logical peeling off.  Could be another test case since  
>>> it is a Gbrowse dependancy
>>> -  Bio::DB::GFF as well for the same reasons.
>> Completely agree (and I think Lincoln would like this as well).
>>> -  Bio::PopGen - self contained for the most part, but depends on  
>>> Bio::Tree and Bio::Align objects
>> Could list those as a required dependency.
>>> -  Bio::Variation
>>> -  Bio::Map and Bio::MapIO
>>> -  Bio::Cluster and Bio::ClusterIO
>>> -  Bio::Assembly
>>> - Bio::Coordinate
>>> My nightmare is that we're going to have to manage a lot of 'use  
>>> XX 1.01' enforcing version requiring when dealing with the  
>>> dependancies on the interface classes and having to keep these all  
>>> up to date?  The version was implicit when they are all part of  
>>> the same big distro.
>> Right.  But it also becomes a maintenance problem when serious bugs  
>> in one module impede the needed release of others to CPAN.
>>> Also the splits need not only include one namespace if need be I  
>>> guess but we have generally grouped things by namespace.
>>> What do you want to do about the bioperl-run.  Do we make a set of  
>>> parallel splits from all of these?  I think at the outset we need  
>>> to coordinate the applications supported here in some sort of  
>>> loose ontology - the namespaces were not consistently applied so  
>>> we have some alignment tools in different directories, etc.  So  
>>> the namespace sort of classifies them but it could be better.  One  
>>> of the challenges of multiple developers without a totally shared  
>>> vision on how it should be done.
>> We could split bp-run and Tools, pairing the wrappers with the  
>> relevant parsers modules.  Not sure if this can be done with  
>> SearchIO as well but it could be tested to see how feasible that  
>> would be.
>>> I'm not convinced that the Bio::Graphics splitoff has been  
>>> painless so we should take stock of how that is working.
>> Really?  Lincoln has made several fixes lately on CPAN, so I  
>> thought everything was going well.  If anything I would think the  
>> lack of additional 1.6.x bioperl releases has probably held Gbrowse  
>> 2.0 up more due to Bio::DB::SeqFeature (my fault, but as you know  
>> life and job take precedence sometimes).
>>> It seems like this split off would be a way to better streamline  
>>> things in bioperl so that modern versions of bioperl might be able  
>>> to better interface with things like Ensembl again too.
>>> How much of this effort is worth triaging on the current code  
>>> versus the efforts we want to make on a cleaner, simpler bioperl  
>>> system that appears to scare so many users (and potential  
>>> developers) off.
>> I say triage away on a branch, but we need to indicate which ones  
>> to whittle out first.  The reason I believe we went for a larger  
>> split initially (as indicated on the wiki page) was to push  
>> something forward and not get too bogged down in the details.  But  
>> we may as well go full throttle and do this right away.
>>> Okay I rambled, hope that was helpful.
>>> -jason
>>> --
>>> Jason Stajich
>>> jason at bioperl.org
>> Very, very helpful.  Now I need a beer.
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -----------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list