[Bioperl-l] Dumping a MSA from BLAST results

Chris Fields cjfields at uiuc.edu
Tue Feb 19 12:18:34 EST 2008


One could use an alternative blastall output format (like -m1 to -m6),  
which give various anchored alignments.  None of these are parsed via  
bioperl as far as I know; might be worth getting something up and  
running if there is enough interest in it.

chris

PS. Here's example output using 'blastall -p blastp -i test2.faa -d  
CP000560.faa -m6', which is query-anchored, flat, blunt ends:

BLASTP 2.2.16 [Mar-25-2007]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|1373160|gb|AAB57770.1| PyrR
          (173 letters)

Database: CP000560.faa
            3693 sequences; 1,147,568 total letters

Searching..................................................done



                                                                   
Score    E
Sequences producing significant alignments:                       
(bits) Value

gb|ABS73893.1| PyrR [Bacillus amyloliquefaciens FZB42]                 
322   1e-89
gb|ABS75590.1| ComFC [Bacillus amyloliquefaciens FZB42]                 
37   6e-04
gb|ABS72500.1| Prs [Bacillus amyloliquefaciens FZB42]                   
28   0.22
gb|ABS72703.1| YcdA [Bacillus amyloliquefaciens FZB42]                  
27   0.49
gb|ABS74832.1| Apt [Bacillus amyloliquefaciens FZB42]                   
26   1.1
gb|ABS75734.1| Upp [Bacillus amyloliquefaciens FZB42]                   
26   1.4
gb|ABS74081.1| NrdE [Bacillus amyloliquefaciens FZB42]                  
25   1.9
gb|ABS76054.1| RocD [Bacillus amyloliquefaciens FZB42]                  
24   4.1
gb|ABS74744.1| Gpr [Bacillus amyloliquefaciens FZB42]                   
24   5.4
gb|ABS74336.1| UvrX [Bacillus amyloliquefaciens FZB42]                  
23   7.1
gb|ABS74825.1| YrvM [Bacillus amyloliquefaciens FZB42]                  
23   9.2
gb|ABS72555.1| RpoB [Bacillus amyloliquefaciens FZB42]                  
23   9.2

1_0      1    MNQKAVILDEQAIRRALTRIAHEMIERNKGMNNCILVGIKTRGIYLAKR--- 
LAER---- 53
ABS73893 1    MNQKAVILDEQAIRRALTRIAHEMIERNKGMNDCILVGIKTRGIYLAKR--- 
LAER---- 53
ABS75590 116  ------------------------------- 
NTHTLIPIPLSGERLAERGFNQSEL---- 140
ABS72500 164  ----------------------------KDLKDIVIVSPDHGGVTRARK--- 
LADR---- 188
ABS72703 55   ----------------------------------------------ALK--- 
VTVT---- 61
ABS74832       
------------------------------------------------------------
ABS75734       
------------------------------------------------------------
ABS74081       
------------------------------------------------------------
ABS74081 502  -----------------------------------------RSAELAKE--- 
KGET---- 513
ABS76054 305  ------VLEEEGLAERSLQLGRYFKEELEKIDNPIIKDVRGRGLFIGVE--- 
LTEAARPY 355
ABS74744 43   -------------------------ERDKG-------GIKVRTVDITKE--- 
GAEL---- 63
ABS74336       
------------------------------------------------------------
ABS74825       
------------------------------------------------------------
ABS72555       
------------------------------------------------------------

1_0      54   IEQIEGNPVTVGEIDITLYRDDLSKKTSNDEPLVKGADIP-V--DITD--- 
QKVILVDDV 107
ABS73893 54   IEQIEGNPVTVGEIDITLYRDDLTKKTSNEEPLVKGADIP-A--DITD--- 
QKVIVVDDV 107
ABS75590 141  LASLLGMPVISPLIRLNNEKQSKKSKTDRLSAEKKFSAAE-N--SATG--- 
MNVILIDDI 194
ABS72500 189  LKA----PIAI---------IDKRRPRPNE---VEVMNIV-G--NVEG--- 
KTAILIDDI 226
ABS72703 62   VKNTGKDPLTVKSSDFSLYQDD--AKTAK-----------------TD--- 
KEDLMQSGT 99
ABS74832 112  --------------------------------------------------- 
QRVLITDDL 120
ABS75734 100  ---------------VGLYRDPETLK-----PVEYYVKLP-S--DVEE--- 
REFIVVDPM 133
ABS74081 386  -----------------LQASQVSAYTDYDEEDEIGLDIS-C--NLGS--- 
LNILNVMKH 422
ABS74081 514  FEHYEGSTYATGEYFNKYIEKEFSPAYEKIAALFEGMHIP-TIEDWKE--- 
LKAFVAENG 569
ABS76054 356   
CEKLKGEGLLCKETHDTVIR---------------------------------------- 375
ABS74744 64    
SGKKQGRYVTIEAQGVREHDSDMQEKVT-------------------------------- 91
ABS74336 109  ----------------------------------KTIDLP-T-- 
NITMDIYRYCLILFDK 131
ABS74825 199  ------------------- 
REDVRKEVGNDEAKIRKAQMP-------------------- 219
ABS72555 1090 -----GAAYTLQEI-LTVKSDDVVGRVKTYEAIVKGDNVPEP--GVPE--- 
SFKVLIKEL 1138

1_0      108   
LYTGRTVRAGMDALVDVGRPSSIQLAVLVDRGHRELPIRADYIGKNIPTSKSEKVMVQLD 167
ABS73893 108   
LYTGRTVRAAMDALVDVGRPSSIQLAVLVDRGHRELPIRADYIGKNIPTSKAEKVMVQLS 167
ABS75590 195   
YTTGATLHQAAEVLLTAGKASSVSSFTLI------------------------------- 223
ABS72500 227   
IDTAGTITLAANALVENG------------------------------------------ 244
ABS72703 100   
LHAGKTVTGNLYFTADEGK----------------------------------------- 118
ABS74832 121   
LATGGTIEATIKLVEELG------------------------------------------ 138
ABS75734 134   
LATGGSAVEAINSL---------------------------------------------- 147
ABS74081 423   
KSIERTVKLATDSLTHVSETTDIRNAPAVRRANKAM------------------------ 458
ABS74081 570   
MY---------------------------------------------------------- 571
ABS76054 374   
------------------------------------------------------------ 375
ABS74744 90    
------------------------------------------------------------ 91
ABS74336 132   
FYTGKTVRS--------------------------------------------------- 140
ABS74825 218   
------------------------------------------------------------ 219
ABS72555 1139 ------ 
QSLGMDVKILSGDEEEIEMRDLED------------------------------ 1162

1_0      168  EVDQND 173
ABS73893 168  EVDQTD 173
ABS75590 222  ------ 223
ABS72500 243  ------ 244
ABS72703 117  ------ 118
ABS74832 137  ------ 138
ABS75734 146  ------ 147
ABS74081 457  ------ 458
ABS74081 570  ------ 571
ABS76054 374  ------ 375
ABS74744 90   ------ 91
ABS74336 139  ------ 140
ABS74825 218  ------ 219
ABS72555 1161 ------ 1162


On Feb 19, 2008, at 10:17 AM, Jason Stajich wrote:

> All the individual pairwise alignments won't necessarily be an  
> alignment of the same region and the gap insertions can be different  
> in each instance of the query sequence that is participating in the  
> pairwise alns so it won't fit into an MSA.
>
> It makes more sense to extract the aligned part of the hit sequences  
> identified and a subsequence of the query which is the min and max  
> region aligned.  Run this through a MSA program.
>
> -jason
> On Feb 19, 2008, at 6:42 AM, Johan Nilsson wrote:
>
>> Hello,
>>
>> I have a question regarding the conversion from a Blast search  
>> result (PSI-blast using blastpgp, to be more exact) to a multiple  
>> sequence alignment file. I'm running the  
>> Bio::Tools::Run::StandAloneBlast and I retrieve the HSPs from the  
>> resulting Bio::Search::Hit::HitI objects. I have no problems  
>> obtaining each HSP alignment using $hit->get_aln. However, rather  
>> than dumping many local alignments, I would like to write a single  
>> result file where the HSPs are interleaved.
>>
>> I guess this shouldn't be too hard, but nevertheless I haven't  
>> found out how to do this in a simple way. Any suggestions would be  
>> highly appreciated!
>>
>> Best Regards
>> /Johan Nilsson
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list