[Bioperl-l] Bio::Restriction::Analysis::fragments() memory usage

Nathan S. Haigh n.haigh at sheffield.ac.uk
Fri Oct 27 10:23:20 EDT 2006


As you may be aware by now, i'm working with Bio::Restriction::Analysis
and friends.

I'm doing restriction analysis on large sequences - chromosomes. I need
to identify an appropriate enzyme based on the total length of fragments
that are of a certain size (e.g. 100 - 500 bp). However, the amount of
memory used by Bio::Restriction::Analysis::fragments() is prohibative. I
have the following code (bottom) which downloads 2 thaliana chromosomes
(mito and chloro - so pretty small) and runs an analysis and then loops
through the fragments for all enzymes in the default collection.

My memory usage just keep on climbing and none seems to get freed up
even when a $ra goes out of scope (start dealing with the next
sequence). Is this a memory leak of some sort, is there a way to free up
memory as I go? I'd appreciate any help/advice on how to reduce the
amount of memory being consumed as I'd like to use all the thaliana
chromosomes (not just mito and chloro), which at the moment probably
won't work.

Cheers
Nath

use strict;
use Bio::DB::GenBank;
use Bio::Restriction::Analysis;
use Bio::Restriction::EnzymeCollection;

my @seq_objs;
my @gis = ( 7525012,  26556996 );

my $db = Bio::DB::GenBank->new(-format => "fasta");
foreach my $gi (@gis) {
  print "Getting GI: $gi\n";
  push @seq_objs, $db->get_Seq_by_id($gi)
}

my $min_fragment_size = 100;
my $max_fragment_size = 500;
my $enz_Coll = Bio::Restriction::EnzymeCollection->new();

foreach my $seq (@seq_objs) {
  my $tot_size = 0;
  print "Processing ", $seq->primary_id,"\n";
  my $ra = Bio::Restriction::Analysis->new(
                                         -seq=>$seq,
                                         -enzymes=>$enz_Coll,
  );
 
  my @all_enzymes = $ra->cutters->each_enzyme;
  print "  Calc total length of fragments in range: $min_fragment_size -
$max_fragment_size\n";
  foreach my $enzyme ( @all_enzymes ) {
    # fragments() is a real memory hog
    foreach my $frag ($ra->fragments($enzyme)) {
      next if $min_fragment_size && (length $frag < $min_fragment_size);
      next if $max_fragment_size && (length $frag > $max_fragment_size);
      $tot_size += length $frag;
    }
    # do something based on value of $tot_size
    #print "    ", $enzyme->name, " total = $tot_size\n";
  }
  print "DONE\n";
}



More information about the Bioperl-l mailing list