Feature Annotation rollback

From BioPerl
Jump to: navigation, search

This is a tracking page for the planned rollback of BioPerl SeqFeature/Annotation changes prior to a new stable release. The code is being tested on a CVS branch (tagname: featann_rollback) prior to merging back onto the main branch. It is hoped this can be completed within a relatively short period of time, though many of the changes introduced are complex and may require extensive testing prior to any release (let alone a stable one). Portions of this page will likely be incorporated into the final release notes; it is not meant to be permanent.

I plan on posting changes and test results in case I run into issues. I am limiting comments on this page to regular and core devs; however, anyone can use the discussion page to make suggestions at any time.

The plan is to rollback changes gradually (in rounds), fixing tests prior to the next round of rollbacks. Hopefully, after everything is done Bio::FeatureIO will work without problems.

For all work, tests were run on Mac OS X (Tiger, Intel) using Perl 5.8.6.

Contents

First round

Tests

   Failed Test     Stat Wstat Total Fail  List of Failed
   -------------------------------------------------------------------------------
   t/BioGraphics.t    3   768    38    3  3-5
   t/DB.t           255 65280   116   31  101-116
   t/Genewise.t       3   768    53    3  37 41 45
   t/Sopma.t          2   512    16    2  8 15
    (2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
   Failed 4/247 test scripts. 24/17250 subtests failed.
   Files=247, Tests=17250, 460 wallclock secs (130.49 cusr + 18.80 csys = 149.29 CPU)
   Failed 4/247 test programs. 24/17250 subtests failed.

Notes

Second round

Tests

   Failed Test          Stat Wstat Total Fail  List of Failed
   -------------------------------------------------------------------------------
   t/BioGraphics.t         3   768    38    3  3-5
   t/DB.t                255 65280   116   31  101-116
   t/Genewise.t            3   768    53    3  37 41 45
   t/Handler.t             2   512   546    2  242 315
   t/SeqFeatAnnotated.t    3   768    26    3  24-26
   t/Sopma.t               2   512    16    2  8 15
   t/genbank.t             1   256   244    1  242
   t/swiss.t               1   256   240    1  9
    (3 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
   Failed 8/247 test scripts. 31/17253 subtests failed.
   Files=247, Tests=17253, 396 wallclock secs (129.15 cusr + 18.68 csys = 147.83 CPU)
   Failed 8/247 test programs. 31/17253 subtests failed.

Notes

  • A few GenBank and SwissProt tests:
    • genbank.t test fail was due to a dropped AnnotationI (not sure why). Test was adjusted to account for the extra AnnotationI for now, but worth further investigation to ensure the extra AnnotationI is legit.
    • swiss.t doesn't roundtrip efficiently; this is due to changes with date formats. The tests have been modified to TODO's for now; a more serious roundtripping set of tests needs to be performed.
  • Some of these are 'lazy' tests using a Bio::AnnotationI object directly as if it is a string. Changing the test to make an explicit method call, adding comment ('no "" operator overloading') to test indicating overloading is not permitted.
  • There appear to be some confusion as to method deprecation in Bio::SeqFeatureI which needs to be cleared up
  • Bio::SeqFeature::Annotated doesn't appear to be complete, with some methods returning data types inconsistent with Bio::SeqFeature::Generic; needs a complete audit and revised (more strenuous) tests
    • Bio::SeqFeature::Annotated::score() changed to explicitly return textual output (no objects). More method changes need to be made for consistency.
  • genbank.t,swiss.t,Handler.t,SeqFeatAnnotated.t now pass; need to address or file bugs on above issues.

Third round

  • Stepping through the various Bio::AnnotationI and adding exceptions to overloads to catch any instances where overloading is used.

Tests

Exceptions added to overloads:

   Failed Test          Stat Wstat Total Fail  List of Failed
   -------------------------------------------------------------------------------
   t/BioGraphics.t         3   768    38    3  3-5
   t/DB.t                255 65280   116   31  101-116
   t/GOterm.t            255 65280    61  108  8-61
   t/Genewise.t            3   768    53    3  37 41 45
   t/SeqFeatAnnotated.t  255 65280    26   42  6-26
   t/Sopma.t               2   512    16    2  8 15
   t/obo_parser.t        255 65280    45   86  3-45
   t/simpleGOparser.t    255 65280   102  202  2-102
    (2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 37 subtests skipped.
   Failed 8/247 test scripts. 243/17248 subtests failed.
   Files=247, Tests=17248, 398 wallclock secs (122.08 cusr + 18.05 csys = 140.13 CPU)
   Failed 8/247 test programs. 243/17248 subtests failed.

Notes

  • Most errors were due to unexpected overloads:
    • if($ann) triggers the overloaded sub, but if(defined $ann) doesn't.
  • There is some API conflict with some modules using Bio::Ontology::Term::add_dblink_context incorrectly (passing values instead of Bio::Annotation::DBLink instances. Possible solution here. I added an exception to catch anything not passing objects, notably which kills these tests:
   t/GOterm.t            255 65280    61  108  8-61
   t/SeqFeatAnnotated.t  255 65280    26   42  6-26
   t/obo_parser.t        255 65280    45   86  3-45
   t/simpleGOparser.t    255 65280   102  202  2-102
  • Several fix-me's found in Bio::FeatureIO::gff.
  • Noticed that OntologyStore.t tests are consistently failing to make server contact.

Fourth Round

  • Bio::SeqFeature::Annotated cleanup will wait until after merging to the main branch (minor fixes only here)
  • Fix various Ontology-related dblink methods inconsistencies in various Bio::Ontology::TermI/Bio::OntologyIO classes and Bio::Annotation::OntologyTerm
    • As noted above, there were several modules which passed in simple scalars to Bio::Ontology::Term while others passed in Bio::Annotation::DBLink instances. Notably the method documentation is ambiguous as to what is required (some indicate scalar values for arguments, others Bio::Annotation::DBLink).
    • In order to rectify this we are reimplementing these methods to be more consistent and specifically allow both strings and Bio::Annotation::DBLink instances. Therefore, use of any Bio::Ontology::Term-related dblink method is deprecated in favor of the following methods:
      • get_dbxrefs (in place of get_dblinks). This method uses parameters (-type and -context); -type can be used to get specific data types in cases where there are mixes of strings and Bio::Annotation::DBLink
      • add_dbxref (in place of add_dblink)
      • remove_dbxrefs (in place of remove_dblinks)
      • has_dbxref (in place of has_dblink)
      • add_dbxref_context (in place of add_dblink_context)
    • Any text comparision between two instances or a scalar and an instance used text output from Bio::Annotation::DBLink::display_text (explicit comparison, as opposed to the implicitly overloaded 'eq' comparisons)

Tests

   Failed Test     Stat Wstat Total Fail  List of Failed
   -------------------------------------------------------------------------------
   t/BioGraphics.t    3   768    38    3  3-5
   t/DB.t           255 65280   116   31  101-116
   t/Genewise.t       3   768    53    3  37 41 45
   t/Sopma.t          2   512    16    2  8 15
    (3 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
   Failed 4/246 test scripts. 24/17195 subtests failed.
   Files=246, Tests=17195, 371 wallclock secs (121.93 cusr + 17.29 csys = 139.22 CPU)
   Failed 4/246 test programs. 24/17195 subtests failed.

Notes

  • All tests now pass (above failures, as noted above, also fail on MAIN). Will merge to main branch soon.

Cleanup

  • Implement Bio::SeqFeature::TypedSeqFeatureI using Bio::SeqFeature::Annotated
  • Fix roundtripping issue with swiss.t/Handler.t
  • Eventually remove Bio::AnnotationI overloads after testing
  • Add tests:
    • Term.t, new methods and test deprecation warnings
    • Annotation.t display_text() (replacement for stringification overloads).
    • More rigorous tests to FeatureIO.t and SeqFeatAnnotated.t

Tests (CVS HEAD)

   Failed Test  Stat Wstat Total Fail  List of Failed
   -------------------------------------------------------------------------------
   t/DB.t        255 65280   116   31  101-116
   t/Genewise.t    3   768    53    3  37 41 45
    (3 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
   Failed 2/247 test scripts. 19/17250 subtests failed.
   Files=247, Tests=17250, 388 wallclock secs (127.72 cusr + 18.88 csys = 146.60 CPU)
   Failed 2/247 test programs. 19/17250 subtests failed.

Notes

  • fixed some bugs with tests in CVS HEAD (DB.t may be a server-related issue).

Simple Benchmark

Though we all know benchmarks have issues, here's a simple benchmark test using the following script and GenBank CP000473 (a 10 Mbp microbial genome) comparing bioperl-live MAIN branch and branch featann_rollback

use strict;
use warnings;
use Benchmark;
use Bio::SeqIO;
 
my $test = shift || die "Must supply file for benchmark\n"; 
 
timethis(
  10,   \&live,
);
 
sub live {
    my $in = Bio::SeqIO->new(-format => 'genbank',
                             -file => $test);
    my $ct = 0;
    while (my $seq = $in->next_seq) {
        $ct++;
    }
    print "Live : Parsed $ct seq(s)\n";
}
  • bioperl-live : 126 wallclock secs (122.79 usr + 0.85 sys = 123.64 CPU) @ 0.08/s (n=10)
  • rollback : 86 wallclock secs (84.87 usr + 0.57 sys = 85.44 CPU) @ 0.12/s (n=10)
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox