[Bioperl-l] modify sequence name
huangyifeicmb at gmail.com
Fri Mar 9 16:50:09 EST 2012
It is fairly easy to do that in perl. You may write a perl script like this:
Step 1: read file 2 line by line and use the function 'split' to separate
sequence Ids and taxon names. Then construct a hash table in which keys are
sequence ids and values are taxon names.
Step 2: read file 1 line by line. For each line with initial '>', use
regular expression to extract its sequence id and find the corresponding
taxon name from the hash table. Then reformat the sequence id and print new
id out (with initial '>'). For each line without initial '>', just print it
If you are not very familiar with perl, I suggest you to learn it by
yourself. Beginning Perl for Bioinformatics is a good book for biologists.
On Fri, Mar 9, 2012 at 2:25 PM, yang liu <yang.liu0508 at gmail.com> wrote:
> Dear colleagues,
> When I do Sanger sequencing, I get hundreds of sequences named by DNA
> Numbers, and for several genes. I need to add taxon name manually for each
> sequence. I wonder is there a way to change the names automatically?
> I have two .txt files.
> file 1, with seqeucens named by DNA Number:
> file 2, with DNA Number and taxa names, seperated by tabs
> 2863 Gelidium
> 2864 Poa
> I hope the final file to be like this,
> Any ideas? Anything help would be appreciated.
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Department of Biology
More information about the Bioperl-l