[Bioperl-l] PAML/Codeml parsing
stefan.kirov at bms.com
Wed Dec 5 09:24:20 EST 2007
When there is a gapless alignment we have a differently formatted output
kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
seed used = 492211105
ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
And parsing this fails...
The next one has gaps and works fine:
kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
seed used = 492252697
Before deleting alignment gaps
ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
CTT GGT TCA GGA GGT CAG TTC CTG
ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
CCT GGT ACA GGA AAC AAG CTT CTG
I will send both whole files as an attachment with another mail (I do
not know if these are going to pass through).
My guess is that the whole _parse_summary method has to be re-worked as
there is no tag to look for before the sequences start. Ugly.
I am not sure what else could become broken if I try to fix it, so I
will leave it to you.
> should be fixed.
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now. Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed. Now sequences precede the information
>>> about the version or the program run. This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change. Th
>>> Jason Stajich
>>> jason at bioperl.org
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
More information about the Bioperl-l