[Bioperl-l] How to change a fasta format alignment into clustalw format?

Fields, Christopher J cjfields at illinois.edu
Wed Sep 12 09:37:46 EDT 2012


The below worked fine for me using the latest bioperl-live.  Are you using an older version?

chris

[cjfields at pyrimidine-laptop clustalw]$ cat convert.pl 
#!/usr/bin/env perl
use Modern::Perl;
use Bio::AlignIO;

my $in = Bio::AlignIO->new(-file => shift,
                           -format => 'fasta');

my $out = Bio::AlignIO->new(-format => 'clustalw');

while (my $aln = $in->next_aln) {
    $out->write_aln($aln);
}

[cjfields at pyrimidine-laptop clustalw]$ cat test.fa 
>SPOG_04578#scry
MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
>SOCG_01498#soct
----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
>SPAC1002.07c#spom
-----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
>SJAG_03288#sjap
--MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE

[cjfields at pyrimidine-laptop clustalw]$ perl convert.pl test.fa 
CLUSTAL W (1.81) multiple sequence alignment


SPOG_04578#scry/1-54   MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
SOCG_01498#soct/1-51   ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
SPAC1002.07c#spom/1-50 -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
SJAG_03288#sjap/1-53   --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE
                              :. :* : .:* ..* **** ***:::.  :*****  .*  .***  

On Sep 12, 2012, at 7:28 AM, Tao Zhu <taozhu at mail.bnu.edu.cn> wrote:

> Hello, everyone
> 
> I have an multiple protein sequence alignment in FASTA format:
> 
>> SPOG_04578#scry
> MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
>> SOCG_01498#soct
> ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
>> SPAC1002.07c#spom
> -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
>> SJAG_03288#sjap
> --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE
> 
> I want to change it to CLUSTALW format. It could have been easy:
> 
> my $in  = shift;
> my $out = shift;
> my $alignio = Bio::AlignIO->new(-file=>$in, -format=>'fasta');
> my $writeio = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw');
> while ( my $align_obj = $alignio->next_aln ) {
>    $writeio->write_aln($align_obj);
> }
> 
> That'OK. However it doesn't work, because it says "seq doesn't validate".
> 
> In fact there has letter "2" in the alignment. Such "2" is intentionally
> marked by myself, meaning a phase-2 intron exists here. I hope to keep
> these markers in the output clustalw format. Is there any methods?
> 
> -- 
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list