[Bioperl-l] how to prevent forced exit?

Chris Fields cjfields at illinois.edu
Tue Mar 15 10:44:23 EDT 2011


Ross,

Hope you're exaggerating, you really shouldn't use this service for retrieving 1 million records as you'll likely find your IP banned by NCBI; they are starting to enforce stricter web-based access to their  server now.  Bio::DB::GenBank uses a GET HTTP request using URI-based parameters which effectively limits the length of the query to around 200-300 IDs per request, so you would have to split single one large request many.  Thousands of repeated requests, even with a timeout, may flag your IP as 'spam'.  You can use something like Bio::DB::EUtilities to grab larger groups of seqs (~1000 IDs) b/c the latest EUtilities uses POST requests vs GET for a large number of IDs, but you are still effectively limited by the number of requests.

Frankly, there are much better/faster ways to do this, not least of which is to just download a GenBank section and parse it directly, or use a BLAST-formatted database and fastacmd to get the seqs of interest in FASTA format.  Any reason why you are not doing this?

chris

On Mar 15, 2011, at 9:16 AM, Ross KK Leung wrote:

> While the complete code is as follows, the real problem is that the get_Stream_by_acc cannot be used repeatedly, such that when I'm feeding a list of accession numbers (e.g. 1 million records) to the perl script, the program will exit with code 255 (likely equivalent to -1). I wonder anybody had encountered this similar problem and has solved it accordingly...
> 
> 
> 
> 
> 
> #!/usr/bin/perl                                                                                                                 use warnings;                                                                                                                                                                                                                                                   
> 
> use Bio::DB::GenBank;
> 
> 
> 
> $gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile', -format =>'Fasta');
> 
> 
> $allseqobj = $gb->get_Stream_by_acc("A3ZI37");
> 
> 
> print "HEELO";
> 
> while ($seqobj = $allseqobj->next_seq) {
>                       #$seqobj = $allseqobj->next_seq;
>                       $seq=$seqobj->seq;
> }
> print "222   HEELO";
> 
> 
> 
> 
> 
> 
> 
> 
> From: Dave Messina [mailto:David.Messina at sbc.su.se] 
> Sent: 2011年3月15日 17:02
> To: Ross KK Leung
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to prevent forced exit?
> 
> 
> 
> Hi Ross,
> 
> 
> 
> Your code is incomplete and you didn't provide the output from running it, so it's not easy to figure out where you're going wrong.
> 
> 
> 
> Try copying the example code directly from here
> 
> 
> 
>    http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/GenBank.html
> 
> 
> 
> and making sure that works first before modifying it.
> 
> 
> 
> 
> 
> More documentation and examples here:
> 
> http://www.bioperl.org/wiki/HOWTO:Beginners
> 
> http://www.bioperl.org/wiki/Bioperl_scripts
> 
> 
> 
> 
> 
> Dave
> 
> 
> 
> 
> 
> 
> 
> On Tue, Mar 15, 2011 at 06:54, Ross KK Leung <ross at cuhk.edu.hk> wrote:
> 
> $gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile', -format =>
> 'Fasta');
> $allseqobj = $gb->get_Stream_by_acc("A3ZI37");
>               l
> print "HEELO";
> while ($seqobj = $allseqobj->next_seq) {
>                       #$seqobj = $allseqobj->next_seq;
> 
>                       $seq=$seqobj->seq;
> 
>                       }
> 
>                   print "222   HEELO";
> 
> 
> 
> I find that the 1st HEELO can be printed while the 2nd one can't. Google
> does not return checking success/failure or null/exist of the Seq Object. As
> the 1st HEELO can be executed, so no throw/exception occurs for the
> get_Stream_by_acc. So what can I do? The real case is not hard-coding this
> A3ZI37 but reading a file that may contain a lot of these "illegitimate"
> accession numbers.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list