[Bioperl-l] strange error parsing a specific NCBI gff file

Sendu Bala bix at sendu.me.uk
Wed Jun 28 04:25:52 EDT 2006

William Hsiao wrote:
> sub process_attributes {
>     my $attr_string = shift;
>     my @attributes = split (/\;/, $attr_string);
>     my %attr;
>     foreach (@attributes){
> 	my ($key, $value) = split /=/;
> 	if ($value=~/\:/){
> 	    my ($subkey, $subvalue) = split (/:/, $value);
             # assign hashref to $key, assign key => value pair to that
> 	    $attr{$key}{$subkey}=$subvalue;
> 	}
> 	else{
             # assign scalar $key
> 	    $attr{$key}=$value;
> 	}
>     }
>     return \%attr;
> }

> NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

>    They generate an error: Can't use string
> ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
>  The strange part is that all I have to do is replace the word
> "function" in front of "=adaptation%20to%20stress;" with another word
> or simply change it to functions or functio or Function, etc, then the
> line parses properly.

The problem is that these lines contain function=x twice, where the 
second x contains a colon.
So your code first assigns $attr{function} = $scalar, and then tries to
do $attr{function}{before_colon} = "after_colon".

Normally the latter would auto-vivicate $attr{function} as a hash 
reference: $attr{function} == HASH(xyz) and then set before_colon => 
after_colon as a key value pair of HASH(xyz). But in this case, 
$attr{function} already exists: $attr{function} == 
"adaptation%20to%20stress". But you try and set before_colon => 
after_colon as a key value pair of that string. Which you can't do.

Basically, your data structure isn't so great, mixing scalars and hash 
references as values of %attr.

The solution may be to parse using Bioperl instead ;).

More information about the Bioperl-l mailing list