[Bioperl-l] SeqHound

Susan J. Miller sjmiller at email.arizona.edu
Tue Feb 5 17:31:27 EST 2008

Chris Fields wrote:
> The URL has changed.  I'll fix this in bioperl-live.
> You can fix this in your script directly for now (though I hate globals):
> use Bio::DB::SeqHound;
> $Bio::DB::SeqHound::HOSTBASE = 
> 'http://dogboxonline.unleashedinformatics.com/';

Thanks Chris, that helps a little bit, but I'm still not having much 
luck with the SeqHound DB.  The CPAN SeqHound.pm documentation for the 
get_Stream_by_Query method says:

Title   : get_Stream_by_query
   Usage   : $seq = $db->get_Stream_by_query($query);
   Function: Retrieves Seq objects from Entrez 'en masse', rather than 
one at a time.  For large numbers of sequences, this is far superior 
than get_Stream_by_[id/acc]().
   Example : $query_string = 'Candida maltosa 26S ribosomal RNA gene';

However, when I try:

$query_string = 'drosophila simulans[orgn]';
$query = Bio::DB::Query::GenBank->new(-db=>'nucest',
$stream = $sh->get_Stream_by_query($query);

I get the error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw 
STACK: Bio::DB::Query::WebQuery::_fetch_ids 
STACK: Bio::DB::Query::WebQuery::ids 
STACK: Bio::DB::SeqHound::get_Stream_by_query 
STACK: SeqHoundQuery.pl:21

There are only 5013 sequences that match this query so it seems odd that 
the Id list is too long...or am I using SeqHound improperly?

(My reason for trying SeqHound is that I want to set up a monthly cron 
job to download nucest fasta sequences for drosphila melanogaster, and 
I've tried NCBI E-Utilities and the script generated by the NCBI ebot 
and in both cases some of the 570828 records get dropped, even after 
running repeated attempts.)


Susan J. Miller
Manager, Scientific Data Analysis
Biotechnology Computing Facility
Arizona Research Laboratories
(520) 626-2597

More information about the Bioperl-l mailing list