[Bioperl-l] Bio::SeqIO::swiss species parsing bug?

David Gonzalez gonzaled at tcd.ie
Fri Aug 17 13:03:35 EDT 2007


	I had a problem with a swissprot file in which the genus and species
were being left undefined, and I believe it could be a bug in the
swiss.pm module.

	When I tried to parse the file with Bio::SeqIO, I got the following
error messages:

Use of uninitialized value in pattern match (m//) at
/sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 965, <GEN0> line 12.
Use of uninitialized value in string eq at
/sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 967, <GEN0> line 12.

	The fields I wanted from the file (gene_id , etc.. ) were fine however,
so it was being parsed.

	I checked the output with Data::Dumper and I found the following in the
species entry; the species is left undefined, and the common name is absent.

 	'species' => bless( {
                             '_ncbi_taxid' => 'Not',
                             '_classification' => [
                                     }, 'Bio::Species' ),

	The species line in the file is formatted according to the swissprot
specifications and includes a common name

OS   Aedes aegypti (yellow fever mosquito)
OC   Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera;
OC   Endopterygota; Diptera; Nematocera; Culicoidea; Culicidae; Culicinae;
OC   Culicini; Aedes.
OX   NCBI_TaxID=Not defined;

	I think the problem is in the line 905 of the swiss.pm file:

902	if(/^OS\s+(\S.+)/ && (! defined($binomial))) {
903	    $osline .= " " if $osline;
904	    $osline .= $1;
905	    if($osline =~ s/(,|, and|\.)$//) {
906		($binomial, $descr) = $osline =~ /(\S[^\(]+)(.*)/;
907             ($ns_name) = $binomial;
908             $ns_name =~ s/\s+$//; #####

	The problem seems to be that there are no punctuation signs, so 905
returns false. The swissprot format does not require the line to end in
'.' I think although it normally does. By just removing the requirement
for the substitution the output of Data::Dumper seemed normal

	'_common_name' => 'yellow fever mosquito',
        '_ncbi_taxid' => 'Not',
        '_classification' => [

	I am using the fink installed bioperl:
	bioperl-pm586   1.4-5   Perl module for biology

	I don't know if this has  been reported/solved in the newer versions of


David Gonzalez Knowles
Smurfit Institute of Genetics
Trinity College

More information about the Bioperl-l mailing list