[Bioperl-l] how to rename genbank header in fasta file?

Jason Stajich jason.stajich at gmail.com
Sat Oct 20 01:43:29 EDT 2012


are you parsing exactly this file - it is in FASTA format not genbank.

You don't need bioperl for this:
perl -i -p -s 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa

I'd read up on regular expressions and perl to learn more about how to do string replacement to learn how to do this better.


On Oct 19, 2012, at 11:23 PM, yang liu <yang.liu0508 at gmail.com> wrote:

> Hello,
> 
> I am a new user of BioPerl, can anyone help with this? I have multiple
> sequences in a fasta file like the following,
> 
>> lcl|NC_014487.1_cdsid_YP_003875479.1 [gene=cox1] [protein=cytochrome c
> oxidase subunit 1] [protein_id=YP_003875479.1] [location=1..1575]
> ATGACAAATCTGATTCGATGGCTCTTCTCTACTAATCACAAGGATATAGGGACTCTCTATTTCATCTTCG
> GCGCCATTGCTGGAGTGATGGGCACATGCTTTTCAGTACTGATTCGTATGGAATTAGCACGCCCCGGCGA
>> lcl|NC_014487.1_cdsid_YP_003875480.1 [gene=cox3] [protein=cytochrome c
> oxidase subunit 3] [protein_id=YP_003875480.1]
> [location=complement(13218..14015)]
> ATGATTGAATCTCAACGGCATTCTTTTCATTTGGTAGATCCAAGTCCATGGCCTATTTCGGGTTCACTCG
> GAGCTTTGGCAACCACCGTAGGAGGTGTGATGTACATGCACTCATTTCAAGGGGGTGCAACACTTCTCAG
> 
>> lcl|NC_014487.1_cdsid_YP_003875481.1 [gene=atp8] [protein=ATPase subunit
> 8] [protein_id=YP_003875481.1] [location=complement(15042..15548)]
> ATGCCTCAACTGGATAAATTTACTTATTTCACACAATTCTTCTGGTCATGCCTTTTTTTCTTTACTTTCT
> ATATTCTAATATGCAATGATAGAGATGGAGTACTTGGGATCAGCAGAATTCTAAAACTACGAAATCAACT
> 
> I hope to rename the sequences by gene name,such as:
> 
>> cox1
> ATGACAAATCTGATTCGATGGCTCTTCTCTACTAATCACAAGGATATAGGGACTCTCTATTTCATCTTCG
> GCGCCATTGCTGGAGTGATGGGCACATGCTTTTCAGTACTGATTCGTATGGAATTAGCACGCCCCGGCGA
>> cox3
> ATGATTGAATCTCAACGGCATTCTTTTCATTTGGTAGATCCAAGTCCATGGCCTATTTCGGGTTCACTCG
> GAGCTTTGGCAACCACCGTAGGAGGTGTGATGTACATGCACTCATTTCAAGGGGGTGCAACACTTCTCAG
> 
> any one can help? Thanks.
> 
> Yang.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org




More information about the Bioperl-l mailing list