[Bioperl-l] Retrieving from an indexed fasta file into a LargeSeq object

Christopher Porter cporter at ohri.ca
Mon Sep 27 15:07:31 EDT 2004

I don't think that I followed how the call to set the factory should be 
used. After creating the Bio::Index::Fasta object, I called the 
_get_SeqIO_object method as described. In return my script exited with 
the exception below (I didn't supply an index to the _get_SeqIO_object 

------------- EXCEPTION  -------------
MSG: Can't get filename for index :
STACK Bio::Index::Abstract::_file_handle 
STACK Bio::Index::AbstractSeq::_get_SeqIO_object 
STACK toplevel ./parseBLAST.pl:172

This is somewhat of academic interest; I'm going to try to rewrite 
using the Bio::DB::Fasta module instead - getting a slice of a large 
sequence is exactly what I need to do.


On 27-Sep-04, at 9:35 AM, Jason Stajich wrote:

> Depends on how many games you want to play here and why you really 
> want a LargeSeq. i.e. are you still going to call 'seq' on the large 
> seq object to get the sequence as a string?  Yes it can be done by 
> changing the factory which SeqIO uses to create the sequence - if you 
> look at the Index::AbstractSeq object.
> you'd want to call:
> (not pretty I know)
> $idx->_get_SeqIO_object->sequence_factory(Bio::Seq::Factory->new(-type 
> => 'Bio::Seq::LargeSeq'));
> However, Lincoln's Bio::DB::Fasta module is better for handling this 
> sort of thing I think as you can request virtual slices of the 
> sequence data.  I bet it will be much faster than how the LargeSeq 
> implementation works although the two use the same idea of using the 
> filesystem instead of memory for the seq storage.  Just make sure your 
> Fasta file is consistently formatted (all sequence lines are the same 
> length, a quick
> 'sreformat fasta fafile > newfafile; mv  newfafile fafile;'  can take 
> care of that).
> -jason
> On Sep 24, 2004, at 4:39 PM, Christopher Porter wrote:
>> I have a fasta file containing large contig sequences, which I have 
>> indexed using Bio::Index::Fasta. Is there a way to use the index to 
>> retrieve sequences into a Bio::Seq::LargeSeq object rather than 
>> Bio::Seq?
>> What I'm currently doing is essentially:
>> #!/usr/bin/perl
>> use strict;
>> use Bio::SeqIO;
>> use Bio::Index::Fasta;
>> my $idx = Bio::Index::Fasta->new('-filename'=>$hcindex);
>> foreach my $acc(keys %$foo){
>> 	my $seqobj = $idx->fetch($acc);
>> 	...
>> }
>> How can I force $seqobj to be a LargeSeq?
>> (At another point in the script I'm using SeqIO to read short 
>> sequences from a non-indexed fasta file - I don't really want to use 
>> LargeSeq for that part.)
>> Thanks,
>> Chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/

More information about the Bioperl-l mailing list