[Bioperl-l] Bioperl-run: Testing alignments generated externally

Nathan Haigh n.haigh at sheffield.ac.uk
Thu Oct 26 06:04:54 EDT 2006


Sendu Bala wrote:
> Nathan Haigh wrote:
>> I'm thinking that it's not wise to test for things like
>> overall_percentage_identity etc in alignments that are generated by
>> external software like T-Coffee, Clustalw etc. Changes to software
>> algorithms/efficiency, bug fixes etc may well alter the quality of the
>> alignment produced in different versions and thus affect the value
>> returned by such methods. Therefore, I think these methods should only
>> be tested from alignments loaded directly from t/data.
>
> Did you discover some specific problem cases?
My messages seem to be taking a while to come through, but, yes. It may
be due to the software changing default parameters, but it makes testing
the output for specific details pretty difficult and inconsistent. For
example, running T-Coffee, the following command from t/TCoffee.t
results in slightly different alignment:
$aln = $factory->run('-type' => 'profile',
                     '-profile' => $aln1,
                     '-seq'  =>
Bio::Root::IO->catfile("t","data","cysprot1b.fa"));

Of particular note, is the gaps on the last line of the sequences. In
4.45, there are two gaps in CATH_RAT/1-133 ('gk-nm---cg') whereas in
<v4.45 this is ('gkn----mcg').

T-Coffee v4.45 returns the following alignment:

>CATH_RAT/1-333
------mwtalpllcagawllsagat----------aeltvnaiek------------fh
ftswmkqhqktyss-reyshrlqvfannwrkiqahn----qrnhtfkmglnqfsdmsfae
ikhkylwsepqncsat--ksnylrgtgp--ypssmdwrkkgnvvspvknqgacgscwtfs
ttgalesavaiasgkmmtlaeqqlvdcaqnfnnh--------gcqgglpsqafeyilynk
gimgedsypyigkngqckfnpekavafvknvv-nitlndeaamveavalynpvsfafevt
-edfmmyksgvyssnschktpdkvnhavlavgygeqn-----gllywivknswgsnwgnn
gyfliergk-nm---cglaacasypipqv
>CATL_HUMAN/1-333
--------------------------------mnptlilaafclgiasatltfdhsleaq
wtkwkamhnrlygmnee-gwrravweknmkmielhnqeyregkhsftmamnafgdmtsee
frqvmngfqnrkpr----kgkvfqeplfyeaprsvdwrekg-yvtpvknqgqcgscwafs
atgalegqmfrktgrlislseqnlvdcsgpqgn--------egcngglmdyafqyvqdng
gldseesypyeateesckynpkysvandtgfv-dip-kqekalmkavatvgpisvaidag
hesflfykegiyfepdc--ssedmdhgvlvvgygfestesdnn-kywlvknswgeewgmg
gyvkmakdrrnh---cgiasaasyptv--
>CATL_RAT/1-334
--------------------------------mtpllllavlclgtalatpkfdqtfnaq
whqwksthrrlygtnee-ewrravweknmrmiqlhngeysngkhgftmemnafgdmtnee
frqivngyrhqkhk----kgrlfqeplmlqipktvdwrekg-cvtpvknqgqcgscwafs
asgclegqmflktgklislseqnlvdcshdqgn--------qgcngglmdfafqyikeng
gldseesypyeakdgsckyraeyavandtgfv-dip-qqekalmkavatvgpisvamdas
hpslqfyssgiyyepnc--sskdldhgvlvvgygyegtdsnkd-kywlvknswgkewgmd
gyikiakdrnnh---cglataasypivn-
>PAPA_CARPA/1-345
mamipsiskllfvaiclfvymglsfg-------------dfsivgysqndltsterliql
feswmlkhnkiyknidekiyrfeifkdnlkyidetn----kknnsywlglnvfadmsnde
fkekytgsiagnytttelsyeevlndgdvnipeyvdwrqkg-avtpvknqgscgscwafs
avvtiegiikirtgnlneyseqelldc----------drrsygcnggypwsalqlvaqy-
gihyrntypyegvqrycrsrekgpyaaktdgvrqvqpynegallysian-qpvsvvleaa
gkdfqlyrggifvgpcgnk----vdhavaavgygpn---------yiliknswgtgwgen
gyirikrgtgnsygvcglytssfypvkn-
>ALEU_HORVU/1-362
maharvlllalavlataavavassssfadsnpirpvtdraastlesavlgalgrtrhalr
farfavrygksyesaaevrrrfrifsesleevrstn----rkglpyrlginrfsdmswee
fqatrlg-aaqtcsatlagnhlmrdaaa--lpetkdwredg-ivspvknqahcgscwtfs
ttgaleaaytqatgknislseqqlvdcaggfnnf--------gcngglpsqafeyikyng
gidteesypykgvngvchykaenaavqvldsv-nitlnaedelknavglvrpvsvafqvi
-dgfrqyksgvytsdhcgttpddvnhavlavgygven-----gvpywliknswgadwgdn
gyfkmemgk-nm---caiatcasypvvaa
>CATH_HUMAN/1-335
------mwatlpllcagawllg--------vpvcgaaelsvnslek------------fh
fkswmskhrktys-teeyhhrlqtfasnwrkinahn----ngnhtfkmalnqfsdmsfae
ikhkylwsepqncsatks--nylrgtgp--yppsvdwrkkgnfvspvknqgacgscwtfs
ttgalesaiaiatgkmlslaeqqlvdcaqdfnny--------gcqgglpsqafeyilynk
gimgedtypyqgkdgyckfqpgkaigfvkdva-nitiydeeamveavalynpvsfafevt
-qdfmmyrtgiysstschktpdkvnhavlavgygekn-----gipywivknswgpqwgmn
gyfliergk-nm---cglaacasypiplv
>CYS1_DICDI/1-343
-----mkvillfvlavftvfvs---------------srgippeeq------------sq
flefqdkfnkkys-heeylerfeifksnlgkieelnliainhkadtkfgvnkfadlssde
fknyylnnkeaiftddlpvadylddefinsiptafdwrtrg-avtpvknqgqcgscwsfs
ttgnvegqhfisqnklvslseqnlvdcdhecmeyegeeacdegcngglqpnaynyiikng
giqtessypytaetgtqcnfnsanigakisnf-tmipknetvmagyivstgplaiaadav
-e-wqfyiggvfdipcn---pnsldhgilivgysakntifrknmpywivknswgadwgeq
gyiylrrgk-nt---cgvsnfvstsii--

While T-Coffee <4.45 returned:
>CATH_RAT/1-333
----------mwtalpllcagawllsagat----------aeltvnaiek----------
--fhftswmkqhqktyss-reyshrlqvfannwrkiqahn----q----rnhtfkmglnq
fsdmsfaeikhkylwsepqncsat--ksnylrgtgp--ypssmdwrkkgnvvspvknqga
cgscwtfsttgalesavaiasgkmmtlaeqqlvdcaqnfnnh--------gcqgglpsqa
feyilynkgimgedsypyigkngqckfnpekavafvknvvn-itlndeaamveavalynp
vsfafevt-edfmmyksgvyssnschktpdkvnhavlavgygeqn-----gllywivkns
wgsnwgnngyfliergkn----mcglaacasypipqv
>PAPA_CARPA/1-345
mamipsiskllfvaiclfvymglsfgdfsivgysqndltsterliqlfeswml-------
-------------khnkiyknidekiyrf-----eifkdnlkyidetnkknnsywlglnv
fadmsndefkekytgsiagnytttelsyeevlndgdvnipeyvdwrqkg-avtpvknqgs
cgscwafsavvtiegiikirtgnlneyseqelldc----------drrsygcnggypwsa
lq-lvaqygihyrntypyegvqrycrsrekgpyaaktdgvrqvqpynegallysia-nqp
vsvvleaagkdfqlyrggifvgpcgnk----vdhavaavgygpn---------yilikns
wgtgwgengyirikrgtgnsygvcglytssfypvkn-
>CATL_HUMAN/1-333
-----------------------------------------mnptlilaafclgiasatl
tfdhsleaqwtkwkamhnrlygmneegwrravweknmkmielhnqeyregkhsftmamna
fgdmtseefrqvmngfqnrkprkgkvfqeplf----yeaprsvdwrekg-yvtpvknqgq
cgscwafsatgalegqmfrktgrlislseqnlvdcsgpqgn--------egcngglmdya
fqyvqdnggldseesypyeateesckynpkysvandtgfvd--ipkqekalmkavatvgp
isvaidaghesflfykegiyfepdc--ssedmdhgvlvvgygfestesdnn-kywlvkns
wgeewgmggyvkmakdrrnh---cgiasaasyptv--
>CATL_RAT/1-334
-----------------------------------------mtpllllavlclgtalatp
kfdqtfnaqwhqwksthrrlygtneeewrravweknmrmiqlhngeysngkhgftmemna
fgdmtneefrqivngyrhqkhkkgrlfqeplm----lqipktvdwrekg-cvtpvknqgq
cgscwafsasgclegqmflktgklislseqnlvdcshdqgn--------qgcngglmdfa
fqyikenggldseesypyeakdgsckyraeyavandtgfvd--ipqqekalmkavatvgp
isvamdashpslqfyssgiyyepnc--sskdldhgvlvvgygyegtdsnkd-kywlvkns
wgkewgmdgyikiakdrnnh---cglataasypivn-
>ALEU_HORVU/1-362
----maharvlllalavlataavavassssfadsnpirpvtdraastlesavlgalgrtr
halrfarfavrygksyesaaevrrrfrifsesleevrstn----r----kglpyrlginr
fsdmsweefqatrlg-aaqtcsatlagnhlmrdaaa--lpetkdwredg-ivspvknqah
cgscwtfsttgaleaaytqatgknislseqqlvdcaggfnnf--------gcngglpsqa
feyikynggidteesypykgvngvchykaenaavqvldsvn-itlnaedelknavglvrp
vsvafqvi-dgfrqyksgvytsdhcgttpddvnhavlavgygven-----gvpywlikns
wgadwgdngyfkmemgkn----mcaiatcasypvvaa
>CATH_HUMAN/1-335
----------mwatlpllcagawllg--------vpvcgaaelsvnslek----------
--fhfkswmskhrktys-teeyhhrlqtfasnwrkinahn----n----gnhtfkmalnq
fsdmsfaeikhkylwsepqncsatks--nylrgtgp--yppsvdwrkkgnfvspvknqga
cgscwtfsttgalesaiaiatgkmlslaeqqlvdcaqdfnny--------gcqgglpsqa
feyilynkgimgedtypyqgkdgyckfqpgkaigfvkdvan-itiydeeamveavalynp
vsfafevt-qdfmmyrtgiysstschktpdkvnhavlavgygekn-----gipywivkns
wgpqwgmngyfliergkn----mcglaacasypiplv
>CYS1_DICDI/1-343
---------mkvillfvlavftvfvs---------------srgippeeq----------
--sqflefqdkfnkkys-heeylerfeifksnlgkieelnliain----hkadtkfgvnk
fadlssdefknyylnnkeaiftddlpvadylddefinsiptafdwrtrg-avtpvknqgq
cgscwsfsttgnvegqhfisqnklvslseqnlvdcdhecmeyegeeacdegcngglqpna
ynyiiknggiqtessypytaetgtqcnfnsanigakisnft-mipknetvmagyivstgp
laiaadav-e-wqfyiggvfdipcn---pnsldhgilivgysakntifrknmpywivkns
wgadwgeqgyiylrrgkn----tcgvsnfvstsii--


More information about the Bioperl-l mailing list