[Bioperl-l] possible bug printing GenBank feature qualfiers

Scott Markel smarkel at scitegic.com
Fri Mar 31 14:17:19 EST 2006


Hilmar,

Thanks for the detailed reply.  The dynamic features we create
are added using Bio::SeqFeature::Generic.

I'm happy to use any of your suggestions.  If the overload
statement in Bio::Annotation::SimpleValue could be changed
in the repository, that's probably the cleanest solution from
my point of view.  The GenBank format writer is just an
instance of SimpleValue's use.

Since I don't use bioperl-live in our product environment, I
typically handle issues like this either in our code or try
to be as faithful as I can to the direction bioperl-live is
heading.  In the latter case, I usually make the same change(s)
in our copy of the released version.

Scott

Hilmar Lapp wrote:

> Scott,
> 
> your fix assumes that $value in reality is not a scalar but a hash ref 
> and that it has a key "value".
> 
> Apparently in your test environment this is all indeed true, but there 
> is no guarantee that this will still be true tomorrow when you next 
> update from CVS (or install a new version).
> 
> It seems to me that making feature tag values Bio::AnnotationI objects 
> and the stringification overload is what is interfering here. More 
> specifically, the broken overload in Bio::Annotation::SimpleValue
> 
>     use overload '""' => sub { $_[0]->value || ''};
> 
> will lead exactly to the behavior you see (b/c $_[0]->value evaluates to 
> false if the value is '0').
> 
> You say you build and populate the feature dynamically - are you using 
> Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated 
> to get this behavior reverted, i.e., will return to using scalars for 
> tag values. (Or so I recall ...)
> 
> To fix the problem for you now, I suggest you either fix the overload 
> statement above to be
> 
>     use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' };
> 
> I suppose this should in fact be committed to the repository - does 
> anybody see any damage from this change?
> 
> Or, if you do want to mess with the GenBank format writer, protect the 
> conversion to string and use the object access method:
> 
>     if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) {
>         # convert SimpleValue object to represented (string) value
>         $value = $value->value;
>     }
> 
> Hth,
> 
>     -hilmar
> 
> On Mar 31, 2006, at 9:31 AM, Scott Markel wrote:
> 
>> Chris,
>>
>> Looks like I made my test case too simple.  In our application,
>> which calls BioPerl, I'm creating the feature with the zero-
>> valued qualifier.  It's not being read in from a file, so
>> my only issue is with writing GenBank files.  The real feature
>> is one for a primer binding site.  The qualifier contains the
>> number of mismatches.  The one line change of
>>
>>      $value = $value->{"value"}
>>
>> definitely fixes our problem and causes no regression
>> failures in our application.
>>
>> Scott
>>
>> Chris Fields wrote:
>>
>>> I tried this on WinXP (I'm using bioperl-live) and got a warning:
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Unexpected error in feature table for  Skipping feature, 
>>> attempting to
>>> recover
>>> ---------------------------------------------------
>>>
>>> Running using debugging shows that no feature key was found in
>>> _read_FTHelper_GenBank.  So I'm getting an error, but on input not 
>>> output.
>>> In fact, turning on -verbose in the SeqIO input object gives the 
>>> below extra
>>> output, whereas turning -verbose on only in the output object just 
>>> gives the
>>> warning above.
>>>
>>> ====================================
>>> C:\Perl\Scripts\gb_test>test.pl
>>> no feature key!
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Unexpected error in feature table for  Skipping feature, 
>>> attempting to
>>> recover
>>> STACK Bio::SeqIO::genbank::next_seq
>>> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
>>> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
>>> sequence length is 10
>>> ====================================
>>>
>>> The sequence came back w/o any features in the feature table, which 
>>> is what
>>> I would expect from this error:
>>> ====================================
>>> LOCUS       MY_LOCUS                  10 aa            linear   linear
>>> DEFINITION  my description.
>>> ACCESSION   12345
>>> KEYWORDS    .
>>> FEATURES             Location/Qualifiers
>>> ORIGIN
>>>         1 atggagaact
>>> //
>>> ====================================
>>>
>>> Adding the extra line before the s/// didn't help any (warning still 
>>> pops
>>> up, no change in output).  Anybody out there with any ideas?
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Markel
>>>> Sent: Thursday, March 30, 2006 7:18 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers
>>>>
>>>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the
>>>> following.
>>>>
>>>> Annotation tags used by Bio::SeqIO::FTHelper were strings and
>>>> are now Bio::Annotation::SimpleValue.  In the _print_GenBank_FTHelper
>>>> subroutine of Bio::SeqIO::genbank the following code still
>>>> assumes that tags are strings.
>>>>
>>>>    foreach my $tag ( keys %{$fth->field} ) {
>>>>        foreach my $value ( @{$fth->field->{$tag}} ) {
>>>>        $value =~ s/\"/\"\"/g;
>>>>
>>>> If the tag value was a zero, an empty string is written.
>>>>
>>>> We think that
>>>>
>>>>            $value = $value->{"value"};
>>>>
>>>> should be added before the s/// call.
>>>>
>>>> Here's our test case.  Note that the qualifier value for "foo"
>>>> is changed to an empty string.
>>>>
>>>> Input file
>>>>
>>>> ====================================
>>>> LOCUS       MY_LOCUS                  10 aa            linear   UNK
>>>> DEFINITION  my description.
>>>> ACCESSION   12345
>>>> FEATURES             Location/Qualifiers
>>>>      misc_feature    1..10
>>>>                      /foo="0"
>>>> ORIGIN
>>>>         1 atggagaact
>>>> //
>>>> ====================================
>>>>
>>>> Perl code
>>>> ====================================
>>>> use strict;
>>>> use warnings;
>>>>
>>>> use Bio::SeqIO;
>>>>
>>>> my $inputFilename = "input.gbff";
>>>> my $outputFilename = "output.gbff";
>>>>
>>>> my $in  = Bio::SeqIO->new(-file   => $inputFilename,
>>>>                           -format => "genbank");
>>>> my $out = Bio::SeqIO->new(-file => ">$outputFilename",
>>>>                           -format => "genbank");
>>>>
>>>> my $sequence = $in->next_seq();
>>>> $out->write_seq($sequence);
>>>> ====================================
>>>>
>>>> Output file
>>>> ====================================
>>>> LOCUS       MY_LOCUS                  10 aa            linear   linear
>>>> DEFINITION  my description.
>>>> ACCESSION   12345
>>>> KEYWORDS    .
>>>> FEATURES             Location/Qualifiers
>>>>      misc_feature    1..10
>>>>                      /foo=""
>>>> ORIGIN
>>>>         1 atggagaact
>>>> //
>>>> ====================================
>>>>
>>>> I'll add this to bugzilla, but first I want to make sure
>>>> I'm not missing something obvious.
>>>>
>>>> Scott


-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com



More information about the Bioperl-l mailing list