[Bioperl-l] Bio::SeqIO -- Reading Formated sequence file(Fasta) into Array

Chris Fields cjfields at uiuc.edu
Tue Feb 28 13:50:50 EST 2006


Is there any particular reason why you aren't opening the file directly with
Bio::SeqIO?  

 sub get_sequence_from_fasta {
      my $file = shift;
      my @seqs= ();
      my $in = Bio::SeqIO->new(-format => 'fasta',
                               -file => "<$file");
      while ( my $seq = $in->next_seq() ) {
         push @seqs, $seq->seq();
      }
      return @seqs;
 }

I'm not completely sure of your intent here, but I think if you want to use
a globbed filehandle this way you need to open the file before entering the
sub then pass the filehandle to the sub.  I'm not sure why you pass the file
name, open the file, attach the file handle, parse the seqs, then return an
array?  Or am I missing something here?

Also, read:

http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

which explains that loading arrays can be memory-intensive if the seqs are
big.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Tuesday, February 28, 2006 11:37 AM
> To: Edward WIJAYA
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence
> file(Fasta) into Array
> 
> 
> On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:
> 
> > Hi,
> >
> > Does Bio::SeqIO has a method  specially designed for
> > reading all the sequences from a fasta file into array.
> >
> no but feel free to contribute one.
> > What I have currently is this subroutine, it seems to me
> > __very inefficient__. I was wondering
> > is there a better way to achieve it.
> >
> Do you have a reason to think this is the slow part of your algorithm
> or are you just going on a gut reaction?  There is certainly overhead
> in calling a method but I am pretty sure that it isn't that
> significant, depends on how many sequences you are reading in I guess.
> 
> Just write a next_seq_array method and have it put the seqs onto an
> array within the method and do a benchmark test to show that it is
> faster.
> 
> -jason
> >
> > sub get_sequence_from_fasta {
> >      my $file = shift;
> >      my @seqs= ();
> >
> >      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->new(-format => 'fasta',
> >                               -noclose => 1 ,
> >                               -fh => \*INFILE);
> >
> >      while ( my $seq = $in->next_seq() ) {
> >         push @seqs, $seq->seq();
> >      }
> >      return @seqs;
> > }
> >
> >
> > BTW, I also have tried to do this. I thought
> > this might be a better way to do the above job.
> > but it doesn't work.
> >
> > sub get_sequence_from_fasta_that_doesnot_work {
> >      my $file = shift;
> >       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
> >      return <$in>;
> > }
> >
> > Hope to hear from you again.
> >
> > --
> > Regards,
> > Edward WIJAYA
> > SINGAPORE
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list