[Bioperl-l] possible bug printing GenBank feature qualfiers

Hilmar Lapp hlapp at gmx.net
Fri Mar 31 13:22:30 EST 2006


Scott,

your fix assumes that $value in reality is not a scalar but a hash ref 
and that it has a key "value".

Apparently in your test environment this is all indeed true, but there 
is no guarantee that this will still be true tomorrow when you next 
update from CVS (or install a new version).

It seems to me that making feature tag values Bio::AnnotationI objects 
and the stringification overload is what is interfering here. More 
specifically, the broken overload in Bio::Annotation::SimpleValue

	use overload '""' => sub { $_[0]->value || ''};

will lead exactly to the behavior you see (b/c $_[0]->value evaluates 
to false if the value is '0').

You say you build and populate the feature dynamically - are you using 
Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated 
to get this behavior reverted, i.e., will return to using scalars for 
tag values. (Or so I recall ...)

To fix the problem for you now, I suggest you either fix the overload 
statement above to be

	use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' };

I suppose this should in fact be committed to the repository - does 
anybody see any damage from this change?

Or, if you do want to mess with the GenBank format writer, protect the 
conversion to string and use the object access method:

	if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) {
		# convert SimpleValue object to represented (string) value
		$value = $value->value;
	}

Hth,

	-hilmar

On Mar 31, 2006, at 9:31 AM, Scott Markel wrote:

> Chris,
>
> Looks like I made my test case too simple.  In our application,
> which calls BioPerl, I'm creating the feature with the zero-
> valued qualifier.  It's not being read in from a file, so
> my only issue is with writing GenBank files.  The real feature
> is one for a primer binding site.  The qualifier contains the
> number of mismatches.  The one line change of
>
>      $value = $value->{"value"}
>
> definitely fixes our problem and causes no regression
> failures in our application.
>
> Scott
>
> Chris Fields wrote:
>
>> I tried this on WinXP (I'm using bioperl-live) and got a warning:
>>
>> -------------------- WARNING ---------------------
>> MSG: Unexpected error in feature table for  Skipping feature, 
>> attempting to
>> recover
>> ---------------------------------------------------
>>
>> Running using debugging shows that no feature key was found in
>> _read_FTHelper_GenBank.  So I'm getting an error, but on input not 
>> output.
>> In fact, turning on -verbose in the SeqIO input object gives the 
>> below extra
>> output, whereas turning -verbose on only in the output object just 
>> gives the
>> warning above.
>>
>> ====================================
>> C:\Perl\Scripts\gb_test>test.pl
>> no feature key!
>>
>> -------------------- WARNING ---------------------
>> MSG: Unexpected error in feature table for  Skipping feature, 
>> attempting to
>> recover
>> STACK Bio::SeqIO::genbank::next_seq
>> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
>> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
>> sequence length is 10
>> ====================================
>>
>> The sequence came back w/o any features in the feature table, which 
>> is what
>> I would expect from this error:
>> ====================================
>> LOCUS       MY_LOCUS                  10 aa            linear   linear
>> DEFINITION  my description.
>> ACCESSION   12345
>> KEYWORDS    .
>> FEATURES             Location/Qualifiers
>> ORIGIN
>>         1 atggagaact
>> //
>> ====================================
>>
>> Adding the extra line before the s/// didn't help any (warning still 
>> pops
>> up, no change in output).  Anybody out there with any ideas?
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Scott Markel
>>> Sent: Thursday, March 30, 2006 7:18 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers
>>>
>>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the
>>> following.
>>>
>>> Annotation tags used by Bio::SeqIO::FTHelper were strings and
>>> are now Bio::Annotation::SimpleValue.  In the _print_GenBank_FTHelper
>>> subroutine of Bio::SeqIO::genbank the following code still
>>> assumes that tags are strings.
>>>
>>>    foreach my $tag ( keys %{$fth->field} ) {
>>>        foreach my $value ( @{$fth->field->{$tag}} ) {
>>>        $value =~ s/\"/\"\"/g;
>>>
>>> If the tag value was a zero, an empty string is written.
>>>
>>> We think that
>>>
>>>            $value = $value->{"value"};
>>>
>>> should be added before the s/// call.
>>>
>>> Here's our test case.  Note that the qualifier value for "foo"
>>> is changed to an empty string.
>>>
>>> Input file
>>>
>>> ====================================
>>> LOCUS       MY_LOCUS                  10 aa            linear   UNK
>>> DEFINITION  my description.
>>> ACCESSION   12345
>>> FEATURES             Location/Qualifiers
>>>      misc_feature    1..10
>>>                      /foo="0"
>>> ORIGIN
>>>         1 atggagaact
>>> //
>>> ====================================
>>>
>>> Perl code
>>> ====================================
>>> use strict;
>>> use warnings;
>>>
>>> use Bio::SeqIO;
>>>
>>> my $inputFilename = "input.gbff";
>>> my $outputFilename = "output.gbff";
>>>
>>> my $in  = Bio::SeqIO->new(-file   => $inputFilename,
>>>                           -format => "genbank");
>>> my $out = Bio::SeqIO->new(-file => ">$outputFilename",
>>>                           -format => "genbank");
>>>
>>> my $sequence = $in->next_seq();
>>> $out->write_seq($sequence);
>>> ====================================
>>>
>>> Output file
>>> ====================================
>>> LOCUS       MY_LOCUS                  10 aa            linear   
>>> linear
>>> DEFINITION  my description.
>>> ACCESSION   12345
>>> KEYWORDS    .
>>> FEATURES             Location/Qualifiers
>>>      misc_feature    1..10
>>>                      /foo=""
>>> ORIGIN
>>>         1 atggagaact
>>> //
>>> ====================================
>>>
>>> I'll add this to bugzilla, but first I want to make sure
>>> I'm not missing something obvious.
>>>
>>> Scott
>
> -- 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at scitegic.com
> SciTegic Inc.                       mobile: +1 858 205 3653
> 9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
> San Diego, CA 92123                 fax:    +1 858 279 8804
> USA                                 web:    http://www.scitegic.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



More information about the Bioperl-l mailing list