Naming Conventions and the Future

From BioPerl
Jump to: navigation, search

A summary of a discussion on method names and the future (see bioperl-l thread,thread continues); editor's comments in italics:

Naming inconsistencies in Bio::Seq and related namespaces

Kevin Brown raised the following issue (thread):

[I] have been seeing some oddities with the naming of methods. A good example would be in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method called "seq" but in the latter case it returns an object (and expects an object when doing a Set) and in the former it returns a string and expects a string when doing a Set.

Raul Mendez Giraldez stumbled on another case in point (thread) (skip this):

Hi Jason,

Thank you so much for your suggestion, although it was 
my $featseq = $seqin->trunc($feature->start, $feature->end);
sice the ''subseq method just give you an string with the sequence, trunc
outputs a seqobj as it is needed to be passed to write_seq''.


El vie, 08-05-2009 a las 13:04 -0700, Jason Stajich escribió:
> The sequence isn't part of the report - or at least isn't parsed but
> you can just do this (pseudo-y-code here).
my $seqout =Bio::SeqIO->new(-format => 'fasta');
for my $feature ( @features ) {
  my $featseq = $seqin->subseq($feature->start, $feature->end);
> On May 8, 2009, at 9:19 AM, Raul Mendez Giraldez wrote:
> > Hi,
> > 
> > I'm trying to get coiled-coiled prediction in protein sequences
> > using
> > Bob Russell's program ncoils, through the bioperl interface
> > Bio::Tools::Run::Coil, but the only thing I can get from any element
> > on
> > the features list is just the sequence name, and few more not so
> > useful
> > atributes.
> > 
> > I'm running the following script:
> > 
> > 
#!/home/rmendez/bin/perl -w
use strict;
use FileHandle;
use Data::Dumper;
use Bio::Tools::Run::Coil;
my $seqin=filein.fasta
my $factory=Bio::Tools::Run::Coil->new('-c');       
my @features=$factory->run($seqin);
print "Printing content of features[0]\n";
print Dumper $features[0];
> > And the output is (the content of the first element of the features
> > array) is :
                 '_gsf_tag_hash' => {
                                      'percent_id' => [
                                      'hid' => [
                                      'evalue' => [
                 '_location' => bless( {
                                         '_location_type' => 'EXACT',
                                         '_start' => 138,
                                         '_end' => 172
                                       }, 'Bio::Location::Simple' ),
                 '_gsf_seq_id' => 'ENSDARP00000084927',
                 '_parse_h' => {},
                 '_root_cleanup_methods' => [
                                              sub { "DUMMY" }
                 '_source_tag' => 'Coils',
                 '_primary_tag' => 'ncoils',
                 '_root_verbose' => 0
               }, 'Bio::SeqFeature::Generic' );
> > Then how could I get the sequence itself with the coil annotation
> > 'xxx'?
> > 
> > Thanks,
> > 
> > Raul
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> >
> >
> > 
> Jason Stajich
> jason -at- bioperl -dot- org

This led to a couple of suggested fixes, reiterated here.

Hilmar Lapp:

I agree, $seq->seq() could possibly be better named. Maybe $seq->seqstr()? ... You can test what kind of object you have using ref() or isa():
$seq = $obj->seq(); 
# we need the sequence string
$seq = $seq->seq() if ref($seq) && $seq->isa("Bio::PrimarySeqI");

Mark Jensen:

FWIW, my preference would be to have any object that has a seq object as a property return objects when a '..._seq' accessor is called. However, the seq objects themselves generally contain the sequence string in their seq() property. We wouldn't want to disrupt that, but would it be worth creating an alias getter/setter for the Seq classes seq() property called 'seqstr'? We could then count on
$seq = $foo->bar_seq # an object
$str = $foo->bar_seq->seqstr # a string
$str2 = $foo->seqstr # a string (not nec same as above)

Chris Fields:

To me, seq() should always return a Bio::PrimarySeqI (derived from invocant PrimarySeqI class). However, this is currently inconsistent as illustrated by [Kevin's] example. Changing this would require a deprecation cycle. A new method, seqstr()/str()/rawseq(), could be guaranteed to return a raw sequence. Similarly, bioseq(), could always return a Bio::PrimarySeqI.

The overall feeling, however, was that there is a need for a new review of naming conventions, not only for the Seq objects, but for the entire toolkit. The biggest hurdle to this, at least psychologically, is the desire to maintain backward compatibility. But as Chris put it, "I don't want to fall into the trap that perl 5.x had fallen into (and is working towards digging out of), namely fear of breaking old code."

There are a couple of possibilities. One is to use the trunk to improve consistency, by making judicious additions to the API, and including appropriate aliases to stay compatible. Another is to embark on BioPerl 2.0, starting with a clean slate and a well-thought-out wish list.

The Role of the Trunk

The trunk is there to make things go. Don't be afraid.

Chris's points:

  • I don't think anyone should feel afraid to change things on trunk, but I think significant changes should be discussed [on the bioperl-l list] so everyone has a chance to chime in.
  • API additions are not nearly as severe as having a method like seq() return a different value.
  • I personally don't have a problem with merging [such changes] to the 1.6 branch (others may disagree though). I consider it a 'bug fix' in a loose way.
  • [T]he reason I made a 1.6 branch is to maintain the snapshot of the code for API reasons. There is no reason we can't add in more explicit methods to main trunk. We can deprecate the use of more ambiguous methods down the road.

Slouching towards BioPerl 2.0

Hilmar's thoughts:

There's been talk some (a long, actually) time ago about BioPerl 2.0 that would start on a clean slate and not be bothered by backwards compatibility demands. That effort never really took off, but maybe this is also a good time to ask the question again whether it's better to introduce the API changes we desire in add/deprecate/remove cycles, or in a more radical fashion starting on a clean slate.

The obvious advantage of the former is that we get API improvements sooner, but making them is possibly more dreadful, discouraging, or not even doable due to compatibility constraints. The disadvantage of the latter is that it really needs a committed crew of people to see it through or otherwise all the nice changes die in some grand but half-finished 2.0 construction site. I think Chris also had plans to branch off a Perl6 version of BioPerl - maybe those could be the same efforts?

I'm not trying to advocate one over the other here; rather, I'd like to help push on that front that is best able to capture the energy of volunteers, as that's what it takes in the end.

Chris rejoins:

I have been toying around with perl6 for a bit now (Rakudo on Parrot implementation). It's possible an alpha for perl6 will be available by christmas this year; Rakudo is now passing over 11000 spec tests. Just to note, Perl6 is another beast altogether from Perl5. Yes, there is supposed to be a backwards compatibility mode, but no one has implemented that yet, and it likely won't be implemented in the near future. Based on that I'm not sure we could really call a bioperl in perl6 bioperl 2.0, more like bioperl6 1.0, as it would be a complete refactor.

As for perl5, it has a nice OO set of modules (Moose) that could be used for refactoring. It implements roles and a few other perl6-ish bits (along with MooseX modules). perl 5.10 also has a few things backported from p6, say(), given/when, state vars, etc. We could require Modern::Perl (perl5.10 with strict/warnings pragmas on) and Moose. I have played around with both and find them quite nice, so I suggest if we were to start a 2.0 effort it should include Moose, and we should push most of the interfaces into roles.

Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl implemented in Moose) on github. We can set up something there using those namespaces if needed.

Depends on where everyone wants to place their efforts. May be less work to port the most important core classes over to Moose, and a simple test implementation will give us an idea on what works Role-wise and what doesn't. From there we could work on p6 variants; that would have to be a separate project altogether. We could also include a few other MooseX modules if it makes life easier.

Jason Stajich adds in another thread:

...[not] withstanding the seemed API confusion caused by ancient decisions on giving function names of Bio::SeqFeatureI seq and Bio::PrimarySeq seq which return different types -- don't forget that Lincoln's Bio::DB::Fasta uses the seq method to return a sequence as a string as well so major API changes in general here will create in all likelihood a big split between the branches that will make any new Bioperl not match up well with existing scripts or libraries that use it - hence the reason for no "great realigning" to a completely well-planned out API rather than the organically grown whims of several generations of devs. I say this in jest a bit - I do want to see changes, but I think it really will have to be called something else besides BioPerl to avoid confusion and the fact that a lot of things will break that depend on the current APIs; "BioPerl2" or something indicating a Perl6 association.

15:02, 8 May 2009 (UTC)~

Personal tools
Main Links