ERPIN

From BioPerl
Jump to: navigation, search

Daniel Gautheret's and André Lambert's suite of programs [1] for searching for RNA structural motifs using secondary structure profiles (SSP). The main advantage this has over descriptor-based programs like RNAMotif is the computation of E-values for matches.

It is considered under active development, so future BioPerl support will be experimental.

It can be found here.

Raw Output Format

 Training set:	"tbox.epn":
     140 sequences of length 97
 Cutoff:		0.00  15.00  
 
 Database:	"B_sub.fas"
     4214574 nucleotides to be processed in 1 sequence
     ATGC ratios: 0.282  0.283  0.217  0.218
 E-value at cutoff 15.0 for 4.2Mb double strand data: 4.20e+00
 
 >gi|50812173|ref|NC_000964.2| Bacillus subtilis subsp. subtilis str. 168, complete genome
 FW   1   20780..20845    44.01  9.62e-09
 GTTTTCAA.TCAGGG.TGGCAAC.GCGAGA.gc------------.TCTCGT.CCCTTT.atggggatgagggctc------------------.TTTTTATTT
 FW   2  112686..112757   36.80  4.25e-05
 CTTTTCAA.ACAGAG.TGGAACC.GCGCGG.ttaaa---------.GCGTCT.CTGTCA.tgtttacatgcagagacgc---------------.TTTTTTTAT
 FW   3  277013..277073   36.72  4.56e-05
 GCTGTTAA.TAAAGG.TGGTACC.GCGAGA.ccc-----------.TCGTCC.TTTGCA.taggacggggg-----------------------.TTTTTTGT-
 FW   4 1377705..1377775  45.16  1.54e-09
 GCGTTCAA.TCAAGG.TGGTACC.ACGGAA.accca---------.TTTCGT.CCTTAT.gaatcaggatgaaatggg----------------.TTTTTTTAT
 FW   5 1612561..1612629  46.76  8.26e-11
 TTTTTCTA.AAAGGG.TGGTACC.GCGAGA.taagctt-------.TCTCGT.CCCTTA.tgggatgagagggc--------------------.TTTTTTTAT
 FW   6 2472254..2472316  44.19  7.31e-09
 TGTGCTAA.TGAGGG.TGGTACC.GCGAAC.ct------------.TTTCGT.CCTTTA.cgtgatgaaaagg---------------------.TTTTTTGTT
 FW   7 3946092..3946165  31.51  2.37e-03
 GTCTTCAA.CCAGGG.TGGTACC.GCGTGC.attgagccacg---.TCCCTT.ATTGGG.atgggctcttttttgtg-----------------.TTTGTA-A-
 >gi|50812173|ref|NC_000964.2| Bacillus subtilis subsp. subtilis str. 168, complete genome
 RC   1 1218449..1218517  40.41  1.11e-06
 TGGAATAA.TCAGGG.TGGTACC.ACGGTT.catt----------.CGTCCC.TTTTTT.acaggggaagaatgagcc----------------.TTTTTT-AT
 RC   2 2607929..2608008  43.70  1.54e-08
 CTCAGCAA.CTAGGG.TGGAACC.GCGGGA.gaac----------.TCTCGT.CCCTAT.gtttgcggctggcaagcatagagacgggag----.TTTTTTG--
 RC   3 2800080..2800160  44.88  2.43e-09
 GCCCGTAA.TCAGGG.TGGTACC.GCGAGA.cagc----------.TCTCGT.CCCTGT.gtaaacgttggtttgcatagggggagggc-----.TTTTTTGCT
 RC   4 2817094..2817169  42.69  6.46e-08
 TATTCCAA.CTAGGG.TGGCACC.ACGGGT.ataac---------.TCTCGT.CCCTAC.tatcatgtatagtaggggcgggag----------.TTTTTTTC-
 RC   5 2868569..2868636  42.06  1.49e-07
 TTCATGAA.AAAAGG.TGGTACC.GCGAAA.gagct---------.TTTCGT.CCTTTT.acagggatgaagagctc-----------------.TTTTTT-C-
 RC   6 2896140..2896207  34.68  2.48e-04
 GCCGTAAA.CAAGGG.TGGTACC.GCGGAA.agaaaagcct----.TTTCGC.CCCTTT.tagctatcgcag----------------------.TTACT-GC-
 RC   7 2929584..2929652  31.83  1.91e-03
 GTCTGAAA.TAAGGG.TGGTACC.GCGGCC.acaactcgtc----.CCTTGT.ACAAGG.gacgggtttttt----------------------.TTATTTTC-
 RC   8 2960251..2960324  42.11  1.40e-07
 TGCGGAAA.AAAGGG.TGGAACC.ACGATT.ccgtttattcaa--.CCTCGT.CCCTTT.catagggggcgggg--------------------.TTTTTATAT
 RC   9 3036940..3037006  37.30  2.70e-05
 TCTTATTA.GTAGGG.TGGTACC.GCGATA.atcaat--------.CGTCCC.TTCGTG.taaacgaaggggcg--------------------.TTTTTT-AT
 RC  10 3104180..3104252  46.93  5.85e-11
 CGCGCTAA.CGAGGG.TGGTACC.GCGGGA.aaacgaaagtc---.TCTCGT.CCCTTT.ttgggatgagggagt-------------------.TTTTTTTA-
 RC  11 3490315..3490403  43.60  1.79e-08
 AGAATCAA.CAAGAG.TGGTACC.GCGGTC.agccgaaggct---.CGTCGT.CTCTTT.atctattagattaggtaggagacggcgggc----.TTTTTTGTT
 RC  12 3855224..3855296  44.61  3.79e-09
 GTCGGGAA.CTTGGG.TGGAACC.ACGGGT.taatcacac-----.ACTCGT.CCCTAT.ctgcgggacgggtgtg------------------.TTTTTTTAT
 RC  13 3855477..3855549  44.61  3.79e-09
 GTCGGGAA.CTTGGG.TGGAACC.ACGGGT.taatcacac-----.ACTCGT.CCCTAT.ctgtgggacgggtgtg------------------.TTTTTTTAT
 RC  14 3855732..3855804  43.74  1.44e-08
 ATCGGGAA.TTTGGG.TGGAACC.ACGGAT.gatcaacac-----.ATTCGT.CCCTTT.tagagggatgggtgtg------------------.TTTTTTTAT
 
 -------- at level 1 --------
 8429234 bases processed
 cutoff: 0.00
 13 config. per site
 84448 hits
 -------- at level 2 --------
 cutoff: 15.00
 120 config. per site
 21 hits
 21 independent hits

References

  1. Gautheret D and Lambert A. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol. 2001 Nov 9;313(5):1003-11. DOI:10.1006/jmbi.2001.5102 | PubMed ID:11700055 | HubMed [erpin]
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox