[Bioperl-l] ORF identification/prediction

Fernan Aguero fernan@iib.unsam.edu.ar
Mon, 8 Jan 2001 19:58:17 -0300

Currently I am calling getorf (from the EMBOSS package) in my scripts to
do this for me.

[fernan@iib4 fernan]$ getorf -h 
Mandatory qualifiers:
[-sequence]          seqall     Sequence database USA
[-outseq]            seqoutall  Output sequence(s) USA

Optional qualifiers:
-table              list       Code to use
-minsize            integer    Minimum nucleotide size of ORF to report
-find               list       This is a small menu of possible output
                               options. The first four options are to
                               select either the protein translation or
                               original nucleic acid sequence of the
                               reading frame. There are two possible
                               definitions of an open reading frame: it
                               either be a region that is free of STOP
                               codons or a region that begins with a
                               codon and ends with a STOP codon. The
                               three options are probably only of
                               to people who wish to investigate the
                               statistical properties of the regions
                               potential START or STOP codons. The last
                               option assumes that ORF lengths are
                               calculated between two STOP codons.
Advanced qualifiers:
-[no]methionine     bool       START codons at the beginning of protein
                               products will usually code for
                               despite what the codon will code for when
                               is internal to a protein. This qualifier
                               sets all such START codons to code for
                               Methionine by default.
-circular           bool       Is the sequence circular
-[no]reverse        bool       Set this to be false if you do not wish
                               find ORFs in the reverse complement of
the                               sequence.
-flanking           integer    If you have chosen one of the options of
                               type of sequence to find that gives the
                               flanking sequence around a STOP or START
                               codon, this allows you to set the number
                               nucleotides either side of that codon to
                               output. If the region of flanking
                               nucleotides crosses the start or end of
                               sequence, no output is given for this

What i find annoying about EMBOSS apps is that the -h (-help) option
prints limited information (unless the options are 'boolean' or
'integer' you don't know what to put there). You have to go to EMBOSS
web site to look for extended help!

Hope this helps,


On Mon, 08 Jan 2001 18:10:26 Jason Stajich wrote:
> To the best of my knowledge, we don't currently have bioperl modules
> that
> predict/identify (depending on your confidence in the software =) Open
> Reading Frames. Eric and I were thinking of working on a bioperl
> module
> for this.  Any suggestions, known pitfalls, etc are welcomed.
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 


# --------------------------------------------------------- #
#                                            _              #
#   Fernan Aguero            |              / \             #
#   Bioinformatics           |       ASCII  \ /  against    #
#   IIB-UNSAM                |      ribbon   /   HTML       #
#   fernan@iib.unsam.edu.ar  |    campaign  / \  email      #
#   ICQ 100325972            |             /   \            #
#                                                           #
# --------------------------------------------------------- #