[Bioperl-l] Memory not sufficient when storing human chromosom 1 in BioSQL

Sendu Bala bix at sendu.me.uk
Fri Jul 4 06:10:53 EDT 2008

[CC:ing Gabrielle who had an identical problem]

Chris Fields wrote:
> On Jul 3, 2008, at 6:48 AM, Andreas Dräger wrote:
>> Recently I have successfully installed the latest version of BioPerl 
>> and BioSQL on my computer, which has 2 GB RAM. Both works fine, but 
>> when trying to insert the genbank file of the human chromosome 1, 
>> which I have downloaded from the NCBI website 
>> (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01/hs_ref_chr1.gbk.gz) I 
>> receive the error message 'Out of memory'. This takes about one hour. 
>> My question is, how I can insert large genbank files in my BioSQL 
>> database using BioPerl. I do not know, what to do. Thank you for your 
>> help!!!
> Have you tried just loading the sequence into memory using Bio::SeqIO?  
> The problem may be the size of the file itself.

Just looping through:
perl -MBio::SeqIO -e '$i Bio::SeqIO->new(-file => "hs_ref_chr1.gbk"); 
while ($seq = $i->next_seq) { $ac = $seq->accession; }'

This gave me a variable memory usage, typically around 360MB, peaking up 
to 980MB before dropping back down again. Seems a little high to me, but 
it doesn't seem to be a memory leak?

Keeping every seq object in memory:
perl -MBio::SeqIO -e '$i Bio::SeqIO->new(-file => "hs_ref_chr1.gbk"); 
@seqs; while ($seq = $i->next_seq) { push(@seqs, $seq); }'

This used up to 810MB. I didn't notice any peakiness, but it may have 
been there.

SeqIO by itself shouldn't be causing any out of memory errors on 2 and 
4GB machines.

What does bioperl-db do as it enters sequences into the db? How does it 
currently deal with species information?

More information about the Bioperl-l mailing list