[Bioperl-l] extract sequences and save into files by genes

Cook, Malcolm MEC at stowers.org
Mon Feb 27 10:47:51 EST 2012


You don't need bioperl for this one.....

The following perl one liner will do it for you.

perl -p -e 'if (1==$.) {($species = $ARGV) =~ s|\.txt||}; if (s/^>(.*)/">${species}"/e) {$gene=$1; open($O{$gene},qq{>> ${gene}.txt}); select($O{$gene})} ; close ARGV  if eof' *.txt


~Malcolm

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of yang liu
> Sent: Saturday, February 25, 2012 12:52 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] extract sequences and save into files by genes
> 
> Dear colleagues,
> 
> I have multiple files named by species name. Each file has ca. 100
> different genes. I want to extract the sequences and save them by gene.
> In the output file, the gene name would be the species name. How should I
> do?
> 
> The input file would be like this (with the file name, Acidosasa.txt,
> Acorus.txt....)
> 
> >rps12
> ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT
> AGAAAATCGCCCGCGC
> TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC
> CAAAAAACCCAACTC
> >psbA
> TTATCCATTAAGAGATGGAACTTCAAGAACAGCTAGGTCTAGAGGGAAGTTGTG
> AGCATTACGTTCGTGC
> ATTACCTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAAC
> GCGACCTTGGCTAT
> .....
> 
> I hope the output file to be like this, file name = rps12.txt, psbA.txt....
> 
> within rps12.txt, the sequence is like,
> 
> >Acidosasa
> 
> ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT
> AGAAAATCGCCCGCGC
> TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC
> CAAAAAACCCAACTC
> 
> 
> 
> 
> 
> >Acorus
> ATGCCAACTATTAAACAACTTATTAGAAACACAAGACAGCCAATCCGAAATGTC
> 
> I do not know if I expressed clearly.
> 
> Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list