[Bioperl-l] PAML/Codeml parsing
stefan.kirov at bms.com
Wed Dec 5 09:35:23 EST 2007
Here are the files.
Stefan Kirov wrote:
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
> seed used = 492211105
> 3 141
> ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
> And parsing this fails...
> The next one has gaps and works fine:
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
> seed used = 492252697
> Before deleting alignment gaps
> 2 162
> ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
>> should be fixed.
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now. Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>> Jason Stajich wrote:
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed. Now sequences precede the information
>>>> about the version or the program run. This means that $result-
>>>>> get_seqs() fails because we don't parse the sequences.
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change. Th
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3237 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment.gz
More information about the Bioperl-l