[Bioperl-l] DB_File and assembly IO

Chris Fields cjfields at illinois.edu
Fri Aug 29 10:30:49 EDT 2008

This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.


The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).


On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:

> Hi Joshua,
> I don't know the specifics of DB_File, but the 'Cannot open file  
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by  
> your program by typing:
>   lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your  
> code or the BioPerl code.
> Best,
> Florent
> Joshua Udall wrote:
>> Bioperl -
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs.  I get a DB_File error at Contig247.  Here's the error:
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>> Looking at the Collection::new, the error is on the middle line:
>>  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die  
>> "Cannot open
>> file: $!\n" ;
>>  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>>  return $self;
>> If I uncomment out the $! die statement that I inserted, I get this:
>> 'Cannot open file tree: Too many open files'
>> Apparently the Collection constructor is creating a new index file  
>> for each
>> one and the handles for each are sticking around?  That confuses me  
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me  
>> that no
>> files were written by default (as I'm doing), rather the Collection  
>> objects
>> are all stored in memory.  I'm pretty sure the error is not a  
>> permission
>> error, and if it is not the open file-handles, what else should I  
>> look for?
>> If I 'warn' the error instead of throwing it, I get:
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm  
>> line 360
>> This kind of makes sense because the index appears not be be  
>> created and it
>> can't look stuff up in an undefined tied hash.  I'm stuck.
>> Thanks for any help and suggestions.
>> OSX, perl 5.8.8, bioperl-live (svn last week)

More information about the Bioperl-l mailing list