[Bioperl-l] ensembl perl API - very slow retrival of data?

Ewan Birney birney at ebi.ac.uk
Mon Sep 11 04:55:31 EDT 2006


On 11 Sep 2006, at 03:00, zhihua li wrote:

> hi netters,
>
> has anyone had any experience in using ensembl perl API (based on  
> bioperl) to retrieve and analyse data from ensembl?  i wanted to  
> retrieve all the genes from ensembl core database. to do this i  
> used a slice adaptor:
>

First off, this question is probably best asked on one of the ensembl  
lists, such as
ensembl-dev at ebi.ac.uk or to the helpdesk at ensembl  
(helpdesk at ensembl.org).


Just to answer it directly; the perl API is slower than BioMart -  
BioMart is explicitly
a denormalised query-optimised system which aims to provide quick  
response, whereas the
Perl API works against the normalised data and is also designed to  
handle both reads and
writes (though of course you can't write to our databases). Therefore  
if you can solve
a problem through the BioMart API I would use that.


That said, this looks a bit slow; the public mysql server  
(ensembldb.ensembl.org) can
get very loaded, and perhaps that was the problem. In addition, as  
the API does alot of
lazy-loading, internet latency as well as throughput can be a problem  
- if your connection
is not ideal then this could cause it.


But I suspect the key reason is the "do something...." part.  
Depending on what you are doing
the API might or might not be doing alot of lazy evaluation.


Just for info - I regularly use the ensembldb.ensembl.org and Perl  
API remotely, and often
something that is genome-wide might take 1 or 2 hours or so. I  
personnally find the ease of
writing with the API fine for this length of time; as I said, if you  
can get the info
from BioMart (not everything which is accessible in the API is  
accessible in BioMart) go
for that.


Your next point is probably to describe the "do something" and post  
to either ensembl-dev
or helpdesk - please don't ask me directly as often I am completely  
max'd out and will
drop the email :)



> $db=new Bio::EnsEMBL::DBSQL::DBAdaptor (...);
> my $slice_adaptor = $db->get_SliceAdaptor();
> my @slices = @{$slice_adaptor->fetch_all('chromosome')};
> foreach my $slice (@slices){
> my @genes=@{$slice->get_all_Genes};
> do something......
> }
>
> it took several hours for the script to get all the genes from  
> ensembl.  if i'd used the website of BioMart and had the same task  
> done, it'd be just a matter of minutes.  So is there a better way  
> of coding? or ensembl modules are just extremely slow?
>
> Thanks a lot!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list