[Bioperl-l] Fetching > 500 sequences
rnilsson at clarku.edu
Fri Mar 26 10:55:22 EST 2004
Thank you very much for your help. I went through and took out any
references to "retmax" in both Bio::DB::Query::GenBank and Bio::DB::GenBank
and Bio::DB::NCBIHelper. Our script first sends a query through
Bio::DB::Query::GenBank and that works fine (it only returns the count found,
and that count is > 7000). However, we then actually query GenBank with
Bio::DB::GenBank and it only returns 500 despite the fact it should be
returning 7000+ (and we really want them all). I compared both of the
scripts which look very similar in what was passed to GenBank.
I was wondering if any other variables would cause the Entrez scripts to
only return 500? We do use mindate/maxdate and we fetch by query string (if
that matters). Any other ideas?
Thanks again for all of your help.
> > > It seems that I have problems with fetching more than 500 sequences
> > > from Genbank using Bioperl. It looks like the script (attached below)
> > > fetches all the 7000+ sequences, but only 500 make it to the output
> > > file. Is there any way to get all these 7000+ sequences written to the
> > > file - that is, is it possible to sidestep the 500 seq. limit?
> I actually debugged and fixed this problem recently for Biopython --
> it looks like a change in the way EUtils works. If you pass 'retmax'
> to the eutils URL then it will only give you back at max 500
> sequences, no matter what you pass for this parameter. The fix I
> found that worked was to not pass 'retmax'.
> The attached patch to Bio/DB/Query/GenBank.pm should fix the
> problem, if similar symptoms equal similar fixes in this case. An
> actual Perl/BioPerl person should look at this, though, as I'm not
> to be trusted for coding Perl :-).
> Hope this helps.
More information about the Bioperl-l