[Bioperl-l] how to rename genbank header in fasta file?

Jason Stajich jason.stajich at gmail.com
Sat Oct 20 20:15:01 EDT 2012


> perl -i -p -e 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa

should have been -e not -s in my example.

you can name the file whatever you want just replace that part in the command above-- it sounds like you are really new to Perl in general so I would recommend some basic books first if you are this new to programming and running scripts - try Unix and Perl to the Rescue at http://unixandperl.com

Jason
On Oct 20, 2012, at 5:56 PM, yang liu <yang.liu0508 at gmail.com> wrote:

> Hello Jason,
>  
> Thanks for your help. I tried the script, it returned:
> Can't open perl script "s/>.+\[gene=([^\]]+)\].+/>$1/": No such file or directory
>  
> Don't know why.
>  
> I named the fasta file as file.fa
>  
> Yang.
> 
> On Sat, Oct 20, 2012 at 1:43 AM, Jason Stajich <jason.stajich at gmail.com> wrote:
> are you parsing exactly this file - it is in FASTA format not genbank.
> 
> You don't need bioperl for this:
> perl -i -p -s 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa
> 
> I'd read up on regular expressions and perl to learn more about how to do string replacement to learn how to do this better.
> 
> 
> On Oct 19, 2012, at 11:23 PM, yang liu <yang.liu0508 at gmail.com> wrote:
> 
>> Hello,
>> 
>> I am a new user of BioPerl, can anyone help with this? I have multiple
>> sequences in a fasta file like the following,
>> 
>>> lcl|NC_014487.1_cdsid_YP_003875479.1 [gene=cox1] [protein=cytochrome c
>> oxidase subunit 1] [protein_id=YP_003875479.1] [location=1..1575]
>> ATGACAAATCTGATTCGATGGCTCTTCTCTACTAATCACAAGGATATAGGGACTCTCTATTTCATCTTCG
>> GCGCCATTGCTGGAGTGATGGGCACATGCTTTTCAGTACTGATTCGTATGGAATTAGCACGCCCCGGCGA
>>> lcl|NC_014487.1_cdsid_YP_003875480.1 [gene=cox3] [protein=cytochrome c
>> oxidase subunit 3] [protein_id=YP_003875480.1]
>> [location=complement(13218..14015)]
>> ATGATTGAATCTCAACGGCATTCTTTTCATTTGGTAGATCCAAGTCCATGGCCTATTTCGGGTTCACTCG
>> GAGCTTTGGCAACCACCGTAGGAGGTGTGATGTACATGCACTCATTTCAAGGGGGTGCAACACTTCTCAG
>> 
>>> lcl|NC_014487.1_cdsid_YP_003875481.1 [gene=atp8] [protein=ATPase subunit
>> 8] [protein_id=YP_003875481.1] [location=complement(15042..15548)]
>> ATGCCTCAACTGGATAAATTTACTTATTTCACACAATTCTTCTGGTCATGCCTTTTTTTCTTTACTTTCT
>> ATATTCTAATATGCAATGATAGAGATGGAGTACTTGGGATCAGCAGAATTCTAAAACTACGAAATCAACT
>> 
>> I hope to rename the sequences by gene name,such as:
>> 
>>> cox1
>> ATGACAAATCTGATTCGATGGCTCTTCTCTACTAATCACAAGGATATAGGGACTCTCTATTTCATCTTCG
>> GCGCCATTGCTGGAGTGATGGGCACATGCTTTTCAGTACTGATTCGTATGGAATTAGCACGCCCCGGCGA
>>> cox3
>> ATGATTGAATCTCAACGGCATTCTTTTCATTTGGTAGATCCAAGTCCATGGCCTATTTCGGGTTCACTCG
>> GAGCTTTGGCAACCACCGTAGGAGGTGTGATGTACATGCACTCATTTCAAGGGGGTGCAACACTTCTCAG
>> 
>> any one can help? Thanks.
>> 
>> Yang.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> 
> 

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org




More information about the Bioperl-l mailing list