[Bioperl-l] working with large alignments

Heikki Lehvaslaiho heikki at ebi.ac.uk
Mon Feb 2 06:41:23 EST 2004

Albert Vilella who is visiting me here at EBI works with really big genomic 
sequence alignments. I've committed several of his modules into cvs for that 
purpose. The most important additions are:

* Bio::Seq::LargeLocatableSeq
    Bio::RangeI compliant Bio::Seq::LargePrimarySeq 
    uses File::Tmp for seq storing
* Bio::Seq::LargeSeqI
    Interface class for LargeSeq implemantations
* Bio::AlignIO::largemultifasta
    IO class creating Bio::Seq::LargeLocatableSeq and SimpleAlign objects

The LargeLocatableSeq is based on code from Bio::Seq::LargePrimarySeq. 
Everything seems to work but if we run tests added to the end of the 
t/AlignIO.t file with larger files, the process is still using large amount 
of memory. We'be interested from hearing from anyone who can suggest 

You are willling to test the code with larger data sets, I've put two files 
http://www.ebi.ac.uk/~lehvasla/bioperl/medium.largemultifasta (1.3M)
http://www.ebi.ac.uk/~lehvasla/bioperl/large.largemultifasta (31M)


	-Heikki  and Albert
