[Bioperl-l] Seeking comments, Bio::Species memleak fix

Chris Fields cjfields at illinois.edu
Sat Jul 3 19:36:15 EDT 2010

On Jul 3, 2010, at 3:55 PM, Hilmar Lapp wrote:

> On Jun 30, 2010, at 1:35 PM, Chris Fields wrote:
>> Bio::Species had previously inherited from Bio::Taxon, but also required
>> it to hold a Bio::Tree::Tree which contained a circular reference back
>> to the Bio::Species object, thus requiring Scalar::Util::weaken.  This
>> has caused several hard-to-diagnose problems with premature garbage
>> collection, including some issues with threads (bug 3017).
> Is there something we should learn for future cases in which circular references are being considered with the S::U::weaken remedy (such as the recent one re: SeqI<->SeqFeatureI)?
> 	-hilmar

In my experience, using weaken() is a bit of a 'code smell', and in general there are nice, sane ways to work around it.  The solutions I outlined for the SeqI<->SeqFeatureI suggestion I thought made sense, if you think of Bio::SeqI as really more like a Bio::SeqRecord (has-a PrimarySeqI, does hold features, annotations, etc).  There are no circular refs in this case:

SeqI has-a SeqFeature(s)I, via FeatureHolderI
SeqI has-a PrimarySeqI
SeqFeatureI has-a PrimarySeqI
Nothing refers back to SeqI

It's a nice design decision and a great delegation of duties: SeqFeatures shouldn't need the Seq(Record) as a whole but the just the sequence itself, present as the PrimarySeqI, therefore they are attached to that.  Any sequence-specific changes (methods, attributes) should therefore be in PrimarySeqI, and if needed could be delegated to for convenience from a relevant SeqI.  

In this particular instance (Bio::Species), there appeared to be a need to store a ref to the Tree (containing self) in self to implement the prior Bio::Species methods, thus requiring the use of weaken().  To me this was an indication that we really needed to rethink the design of the class.  Maybe the containing instance (eg: Bio::Seq in this case) needs to hold the classification (as a Tree) and cache the relevant species NodeI (a Bio::Taxon).  

As we're still at the stage where we are using a Bio::Species, I pushed that functionality for the time being into the Bio::Species class: it has-a Bio::NodeI and a Bio::Tree::Tree, but neither refers back to the Bio::Species.  We're planning on deprecating Bio::Species in future releases anyway, and reimplementing so that Bio::Seq is storing the Tree and the Bio::Taxon species node should be easy when the time comes.

Actually, this probably suggests abstracting the simplest parts of that out into a common interface, which could be switched over to Bio::Seq when needed.  May work on that a bit...


More information about the Bioperl-l mailing list