[Bioperl-l] A couple Eutilities questions
cjfields at uiuc.edu
Fri Sep 28 00:57:55 EDT 2007
On Sep 27, 2007, at 9:51 PM, Warren Gallin wrote:
> I've just started using Bio::DB::Eutilities and I have encountered
> two things that seem like problems.
> I am using the latest (retrieved Wednesday September 26, 2007) CVS
> version, running in an Apple Xserver.
> Problem 1: When I execute the following code:
> #Create new EUTILS object for retrieving sets of entries, given an
> array of accession numbers
> my $gpeptfactory = Bio::DB::EUtilities -> new( -eutil => 'efetch',
> -db => 'protein',
> -rettype =>'genbank',
> -id => \@pro_acc) ;
> my $file = 'temp_hold.gb';
> $gpeptfactory -> get_Response(-file => $file);
> my $retr_seq = Bio::SeqIO->new( -file => $file,
> -format => 'genbank');
> I get the following warning, consistently:
> Use of uninitialized value in concatenation (.) or string at /Library/
> Perl/5.8.1/Bio/DB/GenericWebAgent.pm line 92.
The above works for me w/o problems. The error itself doesn't make
much sense; the line is:
$self->ua(LWP::UserAgent->new(env_proxy => 1,
agent => ref($self).':'.$self->VERSION));
so either $self isn't a ref (which it appears to be) or there is no
version (which is odd but may be a perl bug). What happens if you
hard-code the version number to something simple?
Also, I noticed you're using perl 5.8.1; which version of Mac OS X
are you using? I remember something was off about that perl version
but I can't remember what it was...
> Also, about half the time I get a crash with the following error
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Response Error
> Bad Gateway
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /Library/Perl/5.8.1/Bio/Root/Root.pm:357
> STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.1/Bio/
> STACK: gb_update_v4.pl:118
I have seen it sometimes pop up when the NCBI server is under heavy
server load. It may also be related to your local ISP or setup; see
Supposedly this may pop up with mod_perl but I haven't seen/heard
anything myself related to this.
> The other half of the time the script runs fine through to the end.
> I have no idea whether the crash is related to the warning or not. I
> looked at the line where the warning is generated, and it appears to
> be the "new" method for the GenericWebAgent.pm . I can't see how
> the call to Eutilities is can be passing an undefined value through
> to this method.
EUtilities is-a GenericWebAgent; the new() constructors are chained
using SUPER::new(). Also, you can call VERSION from any variable so
it could be a problem there if VERSION is undef, though again I can't
think why this would fail. Regardless, the 'Use of undefined'
warning is not a fatal error.
> Problem #2:
> When the code runs, I retrieve an incorrect record. I am retrieving
> using accessions, and accession I51532 retrieves two records. One is
> the record I am after, an ion channel protein, the other comes from a
> patent application; the problem is that, although the accession
> number for the unwanted record is AAB76204, the LOCUS entry in the
> record is I51532.
> So, is it possible that the efetch function is collecting on the
> basis of LOCUS, not ACCESSION? I realize that the two are almost
> always the same, but not apparently in this case.
> Any advice and/or explanation is appreciated.
> Warren Gallin
The only means NCBI guarantees to retrieve a unique record every time
is by using the primary ID, which for sequence records is the GI.
The accession works most of the time, and efetch accepts accs in the
place of GI (it's the only eutil that does). However, every once in
a while you get stung and retrieve multiple seqs.
BTW, I entered your sequence into Entrez and it popped up as
discontinued (which could be part of the problem); the current acc is
More information about the Bioperl-l