[Bioperl-l] added Bio::SeqIO::largefasta

Ewan Birney birney@ebi.ac.uk
Tue, 5 Dec 2000 10:22:37 +0000 (GMT)

On Mon, 4 Dec 2000, Jason Stajich wrote:

> I have added support for reading in a large fasta file and making it a
> Bio::Seq::LargePrimarySeq.  Some more testing and debugging will
> need to be done to insure all the weird fasta cases are handled
> since I cannot use the same patterns as are possible in the fasta.pm 
> module since I can only read in one line at a time in order to meet
> our not holding the sequence in memory requirements.  


> Please note that currently next_seq will return a PrimarySeq 
> until I decide if we can have or need a LargeSeq class or just a wrapper 
> as well. Also the Bio::Seq::LargePrimarySeq implementation means that it
> will make a copy of the fasta file to your tmpdir (as defined by
> File::Spec->tmpdir) which if overly large could make your machine very
> unhappy as it could run out of swap space.  You can override the location
> of the tmp file by setting 
> $Bio::Seq::LargePrimarySeq::DEFAULT_TEMP_DIR = 'somedir' 
> BEFORE you instantiate a new LargePrimarySeq object.

I am with hilmar that this should return a Seq object which has-a

> The test, largefasta.t has been added as well and some additional routines
> were added LargePrimarySeq to bring it up to PrimarySeqI spec.
> Some likely uses, at least from my perspective, is the ability to read in
> a large sequence file and chop it into smaller managable chunks for some
> specific tasks.

Also for adding features put a massive coordinate scale (perhaps produced
by some database group somewhere...) and then dumping out the sequence
associated with that efficiently

BTW - so that people know, LargePrimarySeq relies on the fact that 
people use the 


methods to get out regions, not


> This will likely not be on the 0.7 branch as it is new code so we'll have
> to omit it from the branch.

I, personally, think this is fine on the branch, but Hilmar is branch
king, so he has the final say ...

I don't think this is going to break anything.

> Suggestions and Comments are always appreciated.
> -Jason
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.mc.duke.edu/ 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420