Infernal

From BioPerl
Jump to: navigation, search

Sean Eddy's suite of programs [1, 2, 3] for generating covariance models (CM) from RNA alignments where a similar structure is known. These can be used by the package to search for similar structures in other sequences. The package is often used in conjunction with the Rfam database [4] for locating non-coding RNA in sequences in annotation pipelines.

The Infernal/Rfam relationship is similar in scope to Eddy's HMMER package, used in conjunction with Pfam for annotating protein sequences. In fact, CM are comparable to profile hidden Markov Models, except that secondary structural information is included along with sequence consensus information.

The Infernal package is currently nearing a final 1.0 release, with the current release at v.1.0rc1. As Infernal is still in active development, Bioperl support will likewise be experimental until the Infernal API stabilizes. At this point very little filtering is done on HSPs, so the output can be quite verbose.

A Bio::SearchIO module is available for filtering and parsing raw output from cmsearch, the Infernal program which searches sequences using a covariance model. Tools for other parsers may be added over time.

Raw Output Format (Infernal 0.72, cmsearch)

This is partial cmsearch output using a purine riboswitch covariance model.

 CPU time (band calc): 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00
 sequence: gi|2239287|gb|U51115.1|BSU51115
 hit 0   :  15589  15691    78.40 bits
            :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<
          1 aAaaauaaAaaaaaaaauaCuCgUAUAaucucgggAAUAUGGcccgagaGUuUCUACCaG 60      
             A+ A+A+ AAAA A   :CUC:UAUAAU: :GGGAAUAUGGCCC: :AGUUUCUACC:G
      15589 CAUGAAAUCAAAACACGACCUCAUAUAAUCUUGGGAAUAUGGCCCAUAAGUUUCUACCCG 15648   
 
            <<<<_______>>>>>>>,,)))).))))::::::::::::::
         61 gcaaCCGUAAAuugcCuGACUAcG.aGuaAauauuaaauauuu 102     
            GCAACCGUAAAUUGCC:GACUA:G AG: AA + ++  +++++
      15649 GCAACCGUAAAUUGCCGGACUAUGcAGGGAAGUGAUCGAUAAA 15691   
 
 hit 1   :  11655  11756    81.29 bits
            :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<
          1 aAaaauaaAaaaaaaaauaCuCgUAUAaucucgggAAUAUGGcccgagaGUuUCUACCaG 60      
            A AAAU AAA+AA A+   : CGUAUAAU::CG:GAAUAUGGC:CG::AGU UCUACCA:
      11655 AGAAAUCAAAUAAGAUGAAUUCGUAUAAUCGCGGGAAUAUGGCUCGCAAGUCUCUACCAA 11714   
 
            <<<<_______>>>>>>>,,))))))))::::::::::::::
         61 gcaaCCGUAAAuugcCuGACUAcGaGuaAauauuaaauauuu 102     
            GC ACCGUAAAU GC:UGACUACG :   AU+U +++  UUU
      11715 GCUACCGUAAAUGGCUUGACUACGUAAACAUUUCUUUCGUUU 11756       

Raw Output Format (Infernal 0.81, cmsearch)

CM 1: Purine
CM lambda and K undefined -- no statistics
Using CM score cutoff of 0.00
>gi|633168|emb|X83878.1|

  Plus strand results:

 Query = 1 - 102, Target = 168 - 267
 Score = 79.36, GC =  46

           :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<
         1 aAaaauaaAaaaaaaaauaCuCgUAUAaucucgggAAUAUGGcccgagaGUuUCUACCaG 60      
           + A A++A AA A  AA:AC+C:UAUAAU::CG:G AUAUGGC:CG::AGUUUCUACC:G
       168 UUACAAUAUAAUAGGAACACUCAUAUAAUCGCGUGGAUAUGGCACGCAAGUUUCUACCGG 227     

           <<<<_______>>>>>>>,,))))))))::::::::::::::
        61 gcaaCCGUAAAuugcCuGACUAcGaGuaAauauuaaauauuu 102     
            CA CCGUAAA UG C:GACUA:G+GU:A  A+U  A+    
       228 GCA-CCGUAAA-UGUCCGACUAUGGGUGAGCAAUGGAACCGC 267     


  Minus strand results: 

 Query = 1 - 102, Target = 270 - 171
 Score = 2.25, GC =  48

           :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<
         1 aAaaauaaAaaaaaaaauaCuCgUAUAaucucgggAAUAUGGcccgagaGUuUCUACCaG 60      
             +    +   A    +:AC C:UA  +::: ::   UA GG :: :::GU    AC: G
       270 CGUGCGGUUCCAUUGCUCACCCAUA-GUCGGACAU-UUACGG-UGCCCGGUAGAAACUUG 214     

           <<<<_______>>>>>>>,,.))))))))::::::::::::::
        61 gcaaCCGUAAAuugcCuGAC.UAcGaGuaAauauuaaauauuu 102     
           ::::CC UA  ::::C :   UA:G GU: +  U+++AUAUU 
       213 CGUGCCAUAUCCACGCGAUUaUAUGAGUGUUCCUAUUAUAUUG 171     


//
Fin

References

  1. Griffiths-Jones S, Bateman A, Marshall M, Khanna A, and Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003 Jan 1;31(1):439-41. PubMed ID:12520045 | HubMed [infernal]
  2. Eddy SR. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics. 2002 Jul 2;3:18. PubMed ID:12095421 | HubMed [eddy2002]
  3. Eddy SR and Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994 Jun 11;22(11):2079-88. PubMed ID:8029015 | HubMed [eddy1994]
  4. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, and Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D121-4. DOI:10.1093/nar/gki081 | PubMed ID:15608160 | HubMed [rfam]
All Medline abstracts: PubMed | HubMed
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox