[Bioperl-l] improve speed in extracting Fasta sequence

Siaw Ling Lo siawlinglo at yahoo.com
Mon Dec 27 20:50:44 EST 2004

 I am new to bioperl and I need to extract fasta
 sequences from Uniprot using a list of accession
 number in a file. The response time is very slow (60
sequences extracted in an hour) as
 the list of accession number is in thousands. Is
 a way to improve the speed?
 The following is the code:
 use Bio::SeqIO; 
 my $file = 'uniprot';
 my $format = 'Fasta';
 #read in accession no input file
 open (ACC, "acc.txt") or die "an error occured with
 reading acc file: $!";
 #loop thru the input file and write to output file
 chomp;  # remove newline
 $accs[$x] = $_;
 $x++ ;
 $count = @accs;
 #open write out file - Fasta sequence file
 open(FILEHANDLE, ">uniprot_fasta.txt") or die
 open out file for writing: $!";
 my $inseq = Bio::SeqIO->new('-file' => "<$file",
 '-format' => $format ); 
 # get sequence
 while (my $seq = $inseq->next_seq) { 
 #search for the acc in the fasta file and extract it
 for ($i=0; $i<$count; $i++){
 #strip off all trailing white spaces - tabs, spaces,
 new lines and returns
 $accs[$i] =~ s/\s+$//; 
 	#if match, print out the line 
 	if ($seq->desc() =~ /$accs[$i]/) {
 		print FILEHANDLE ">";
 		print FILEHANDLE $seq->desc(),"\n";
 		print FILEHANDLE $seq->seq,"\n";
 		#break out of loop when found
 Any advice is much appreciated.
 Thank you,
 Siaw Ling

Do you Yahoo!? 
Send holiday email and support a worthy cause. Do good. 

More information about the Bioperl-l mailing list