[Bioperl-l] match patterns to retrieve seqs

Philipp Schiffer philipp.schiffer at googlemail.com
Sun Mar 18 12:49:11 EDT 2012


I am trying to write a small script to retrieve such sequences 
(including headers) from a fasta file that match the headers I listed in 
another file.
The fasta file looks like:
 >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein 
AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191
 >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein 
AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108
 >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein 
AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5

while the list file is:

My perl so far would be:

#! /usr/bin/perl -w

use Bio::SeqIO;

my $usage = "list_from_fasta.pl fastafile format headerlistfile\n";
my $file   = shift or die $usage;
my $format = shift or die $usage;
my $headerlist = shift or die $usage;

open LIST, "<$headerlist" or die $!;
my $queries = <LIST>; #might be a hash as well, don't know what's better

my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format);
while (my $fastaseq = $fastafile->next_seq){

     if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after 
much trying it seems actually to look for matches, but does not find 
any, still there must be

     {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";}

So I am wondering, is this the right way to tackle the problem. Will it 
work? Should I use another bioPerl module? My thinking is, that the 
pattern matching operation after the 'if' statement is wrong.

Any help highly appreciated!

Thanks a lot.


More information about the Bioperl-l mailing list