[Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon May 4 15:58:23 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2823


cjfields at bioperl.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|blocker                     |normal
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX




------- Comment #2 from cjfields at bioperl.org  2009-05-04 15:58 EST -------
Not a bug, per se.  The problem here has to do with the sequences you are
trying to load into memory, which represent full-length eukaryotic chromosome
builds and relevant features.  The first record in the file you are trying to
load is:

ID   CH466519; SV 1; linear; genomic DNA; ANN; MUS; 112224630 BP.

So, yes, you'll very likely segfault after attempting to load all annotation,
features, and sequence information into memory.  As we can't derive what the
memory footprint for any particular Bio::Seq is until it's loaded there really
isn't much we can do until we create a lazily implemented Bio::SeqI (and the
proper iterative interfaces for Features).  That's not high on anyone's
priority list, as most consider the best option is to use a relational database
capable of storing the data you need and that can access segments of the
sequence you want w/o the memory overhead.  

I personally use the Ensembl Perl API, but UCSC and Bio::DB::SeqFeature::Store
also come to mind.  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list