[Bioperl-l] possible bug printing GenBank feature qualfiers

Scott Markel smarkel at scitegic.com
Fri Mar 31 15:48:24 EST 2006


Chris,

I get clean runs with both a Windows build of Perl

     This is perl, v5.8.7 built for MSWin32-x86-multi-thread

and a cygwin build

     This is perl, v5.8.6 built for cygwin-thread-multi-64int

My test input file should be okay.  I started with a valid
file from NCBI and trimmed.  misc_feature is allowed to
have qualifiers.  To be DDBJ/EMBL/GenBank Feature Table
compliant, we could change "foo" to "note", but I don't think
BioPerl is doing any checks for valid qualifier names.

Scott

Chris Fields wrote:

> Well, I'm running off bioperl-live now using WinXP and latest ActivePerl
> (5.8.8.817) and, although all tests pass (SeqIO and genbank), I keep getting
> the same error and erroneous output using Scott's (albeit very simple)
> sequence example.  The problem seems to pop up on the input end, not output
> (the 'no feature key' in the below output only shows up when -verbose is
> turned on with the input SeqIO object).  
> 
> Questions:
> 
> 1) Is the below example Scott gave valid GenBank format?  I don't know, but
> it looks okay.
> 2) If so, should it work?  Yes, no question.
> 3) And if it is supposed to, why isn't it working here?  Don't know, but any
> of the mentioned fixes don't do anything (get rid of the error) for me.
> Scott gets it to work but my guess is that it is b/c he's using a Linux/UNIX
> flavor.  Can't wait 'til I get my MacTel (4 more months....)
> 
> I'm personally not too worried about it at the moment as anything I passed
> through SeqIO has worked w/o a problem, even on WinXP.  It's just a bit
> frustrating to see something fail here that seems to work elsewhere.
> 
> So here's what I did:
> 
> input.gbff
> ====================================
> LOCUS       MY_LOCUS                  10 aa            linear   UNK
> DEFINITION  my description.
> ACCESSION   12345
> FEATURES             Location/Qualifiers
>       misc_feature    1..10
>                       /foo="0"
> ORIGIN
>          1 atggagaact
> //
> ====================================
> 
> Run through this:
> ====================================
> use Bio::SeqIO;
> 
> my $inputFilename = "input.gbff";
> my $outputFilename = "output.gbff";
> 
> my $in  = Bio::SeqIO->new(-verbose  => 1,
>                           -file   => $inputFilename,
>                            -format => "genbank");
> 
> my $out = Bio::SeqIO->new(-verbose => 0,
>                           -file => ">$outputFilename",
>                            -format => "genbank");
> 
> my $sequence = $in->next_seq();
> $out->write_seq($sequence);
> ====================================
> 
> Gets this error:
> 
> ====================================
> C:\Perl\Scripts\gb_test>test.pl
> no feature key!
> 
> -------------------- WARNING ---------------------
> MSG: Unexpected error in feature table for  Skipping feature, attempting to
> recover
> STACK Bio::SeqIO::genbank::next_seq
> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
> sequence length is 10
> ====================================
> 
> And this output, which is missing the feature, somewhat expected judging
> from the error (output.gbff)
> 
> ====================================
> LOCUS       MY_LOCUS                  10 aa            linear   linear 
> DEFINITION  my description.
> ACCESSION   12345
> KEYWORDS    .
> FEATURES             Location/Qualifiers
> ORIGIN      
>         1 atggagaact
> //
> ====================================
> 
> I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give
> it a try this weekend on Mac OS X using the latest CVS to see what happens.
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
>>-----Original Message-----
>>From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
>>Lapp
>>Sent: Friday, March 31, 2006 1:51 PM
>>To: Chris Fields
>>Cc: Scott Markel; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers
>>
>>The only problem with Heikki's version of the line is that if the
>>value is undefined you get a (ugly) warning from perl stating that you
>>printed an undefined value. Since normally tags should always have a
>>value (even if an empty string) this is a rather theoretical issue.
>>
>>   -hilmar
>>
>>On 3/31/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>>>Sorry about that; stupid Outlook sent my mail before I had a chance to
>>>finish it up.
>>>
>>>The Bio::Annotation::Simple fix sounds best, but the problem is that CVS
>>>shows a fix on this line by Heikki after 1.5.1 was released:
>>>
>>>       fix to allow 0 values despite operator overload (Paul Mooney)
>>>
>>>which changed the overload to:
>>>
>>>         use overload '""' => sub { $_[0]->value};
>>>
>>>I'll try out your fix here to see if it breaks anything (can't see why
>>
>>it
>>
>>>would), but I may need to dig through the archives a little to see why
>>
>>this
>>
>>>latest change was made.  If everything works and passes tests I'll roll
>>
>>back
>>
>>>the commit I made to Bio::SeqIO::genbank earlier today.
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>>Sent: Friday, March 31, 2006 12:23 PM
>>>>To: Scott Markel
>>>>Cc: bioperl-l at lists.open-bio.org; Chris Fields
>>>>Subject: Re: [Bioperl-l] possible bug printing GenBank feature
>>
>>qualfiers
>>
>>>>Scott,
>>>>
>>>>your fix assumes that $value in reality is not a scalar but a hash ref
>>>>and that it has a key "value".
>>>>
>>>>Apparently in your test environment this is all indeed true, but there
>>>>is no guarantee that this will still be true tomorrow when you next
>>>>update from CVS (or install a new version).
>>>>
>>>>It seems to me that making feature tag values Bio::AnnotationI objects
>>>>and the stringification overload is what is interfering here. More
>>>>specifically, the broken overload in Bio::Annotation::SimpleValue
>>>>
>>>>      use overload '""' => sub { $_[0]->value || ''};
>>>>
>>>>will lead exactly to the behavior you see (b/c $_[0]->value evaluates
>>>>to false if the value is '0').
>>>>
>>>>You say you build and populate the feature dynamically - are you using
>>>>Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is
>>
>>slated
>>
>>>>to get this behavior reverted, i.e., will return to using scalars for
>>>>tag values. (Or so I recall ...)
>>>>
>>>>To fix the problem for you now, I suggest you either fix the overload
>>>>statement above to be
>>>>
>>>>      use overload '""' => sub { defined($_[0]->value) ? $_[0]->value
>>
>>: ''
>>
>>>>};
>>>>
>>>>I suppose this should in fact be committed to the repository - does
>>>>anybody see any damage from this change?
>>>>
>>>>Or, if you do want to mess with the GenBank format writer, protect the
>>>>conversion to string and use the object access method:
>>>>
>>>>      if (ref($value) && $value->isa("Bio::Annotation::SimpleValue"))
>>
>>{
>>
>>>>              # convert SimpleValue object to represented (string)
>>
>>value
>>
>>>>              $value = $value->value;
>>>>      }
>>>>
>>>>Hth,
>>>>
>>>>      -hilmar
>>>>
>>>>On Mar 31, 2006, at 9:31 AM, Scott Markel wrote:
>>>>
>>>>
>>>>>Chris,
>>>>>
>>>>>Looks like I made my test case too simple.  In our application,
>>>>>which calls BioPerl, I'm creating the feature with the zero-
>>>>>valued qualifier.  It's not being read in from a file, so
>>>>>my only issue is with writing GenBank files.  The real feature
>>>>>is one for a primer binding site.  The qualifier contains the
>>>>>number of mismatches.  The one line change of
>>>>>
>>>>>     $value = $value->{"value"}
>>>>>
>>>>>definitely fixes our problem and causes no regression
>>>>>failures in our application.
>>>>>
>>>>>Scott
>>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>
>>>>>>I tried this on WinXP (I'm using bioperl-live) and got a warning:
>>>>>>
>>>>>>-------------------- WARNING ---------------------
>>>>>>MSG: Unexpected error in feature table for  Skipping feature,
>>>>>>attempting to
>>>>>>recover
>>>>>>---------------------------------------------------
>>>>>>
>>>>>>Running using debugging shows that no feature key was found in
>>>>>>_read_FTHelper_GenBank.  So I'm getting an error, but on input not
>>>>>>output.
>>>>>>In fact, turning on -verbose in the SeqIO input object gives the
>>>>>>below extra
>>>>>>output, whereas turning -verbose on only in the output object just
>>>>>>gives the
>>>>>>warning above.
>>>>>>
>>>>>>====================================
>>>>>>C:\Perl\Scripts\gb_test>test.pl
>>>>>>no feature key!
>>>>>>
>>>>>>-------------------- WARNING ---------------------
>>>>>>MSG: Unexpected error in feature table for  Skipping feature,
>>>>>>attempting to
>>>>>>recover
>>>>>>STACK Bio::SeqIO::genbank::next_seq
>>>>>>C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
>>>>>>STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
>>>>>>sequence length is 10
>>>>>>====================================
>>>>>>
>>>>>>The sequence came back w/o any features in the feature table, which
>>>>>>is what
>>>>>>I would expect from this error:
>>>>>>====================================
>>>>>>LOCUS       MY_LOCUS                  10 aa            linear
>>
>>linear
>>
>>>>>>DEFINITION  my description.
>>>>>>ACCESSION   12345
>>>>>>KEYWORDS    .
>>>>>>FEATURES             Location/Qualifiers
>>>>>>ORIGIN
>>>>>>        1 atggagaact
>>>>>>//
>>>>>>====================================
>>>>>>
>>>>>>Adding the extra line before the s/// didn't help any (warning
>>
>>still
>>
>>>>>>pops
>>>>>>up, no change in output).  Anybody out there with any ideas?
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>Dept. of Biochemistry
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com



More information about the Bioperl-l mailing list