[Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis
bottomsc at missouri.edu
Thu Sep 6 18:34:17 EDT 2012
Dear BioPerl Community,
I welcome further comments on Bio::App::SELEX::RNAmotifAnalysis. Below
this message is my updated "perldoc" for it.
A special thanks to Leon Timmermans and Chris Fields for their prior feedback.
Leon, I made several Improvements based on your feedback. Now, a
wrapper script is used instead of the module file itself. FASTQ files
are an acceptable input format. And the installer was improved. I
found that installing Module::Build before our module cleared up
issues with several dependencies. I think that our instructions for
using cpanminus effectively give the same results as using local::lib.
As for Alien packages, I would really like to work on them, but the
time (i.e. funding) is currently too limited.
Chris, Yes, the "App" part of the proposed name was chosen because
this is designed more to be an application than to be modules to be
reused. I fully intend to support this distribution myself. If you
think it better to separate it more from the BioPerl namespace, I have
considered using calling it App::Bio::SELEX::RNAmotifAnalysis. What do
I welcome additional feedback.
Perldoc for Bio::App::SELEX::RNAmotifAnalysis:
RNAmotifAnalysis --fastq seqs.fq --cpus 4 --run
This module pipelines steps in the analysis of SELEX
of Ligands through EXponential enrichment) data.
This main module creates scripts to do the following:
(1) Cluster similar sequences based on edit distance.
(2) Align sequences within each cluster (using mafft).
(3) Calculate the secondary structure of the aligned sequences (using
RNAalifold, from the Vienna RNA package)
(4) Build covariance models using cmbuild from Infernal.
Another useful utility installed with this distribution is
"selex_covarianceSearch" for doing iterative refinements of
If you want to use files that simply list sequences, then use
the "--simple" flag instead of the "--fastq" flag.
This script assumes that you've already done all of the quality
control of your sequences beforehand. If the FASTQ format is
used, quality scores are ignored.
RNAmotifAnalysis --infile seqs.fq --cpus 4 --run
This will cluster the sequences found in 'seqs.fq' and create
a FASTA file
for each one. The FASTA files will be grouped into batches (i.e. one per
cpu requested) that will be placed in a separate directory for
and processed within that directory. At the end of processing, for each
cluster there will be a covariance model and postscript illustration
files. The batch script used to process each batch will be
located in the
respective batch directory. To produce the scripts without
simply exclude the --run flag from the command line.
CONFIGURATION AND ENVIRONMENT
As written, this code makes heavy use of UNIX utilities and is
therefore only supported on UNIX-like environemnts (e.g.
Linux, UNIX, Mac
Install Infernal, MAFFT, and the RNA Vienna package ahead of
time and add
the directories containing their executables to your PATH, so that the
first time you run RNAmotifAnalysis.pm the configuration file
that is generated will have all of the correct parameters. Otherwise,
you'll need to update the configuration file manually.
To update the PATH environment variable with the directory
update your .bashrc file, thus:
echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc.
Now, every time you open a new terminal window, the PATH environment
variable will contain '/usr/local/myapps/bin/'. To make your new .bashrc
file effective immediately (i.e. without having to open a new terminal
window), use the following command:
These installation instructions assume being able to open and use a
terminal window on Linux.
(0) Some systems need several dependencies installed ahead of time.
You may be able to skip this step. However, if subsequent
work, then be sure that some basic libraries are installed, as shown
below (or ask a system administrator to take care of it). For the
applicable distribution, open a terminal and then type the
For RedHat or CentOS 5.x systems (tested on CentOS 5.5)
sudo yum install gcc
For RedHat or CentOS 6.x systems (tested on "Minimal
Desktop" CentOS 6.0)
sudo yum install gcc
sudo yum install perl-devel
For Ubuntu systems (tested on Ubuntu 12-04 LTS)
sudo apt-get install curl
For Debian 5.x systems:
sudo apt-get install build-essentials
(1) Install the non-Perl dependencies:
(Versions shown are those that we've tested. Please contact us if
newer versions do not work.)
Infernal 1.0.2 (http://infernal.janelia.org/)
RNA Vienna package 1.8.4 (http://www.tbi.univie.ac.at/~ivo/RNA/)
After installing these, make sure all of the foloowing
in directories within your PATH:
(2) Either (a) download and run our installer or (b) use a CPAN client
to install Bio::App::SELEX::RNAmotifAnalysis. Note that
creates the directory 'perl5' inside your home directory. This
directory is for holding Perl modules, including this module and any
Perl module dependencies not already included on your system. The
installer also appends commands to your .bashrc file to make it easy
for the Perl runtime to find these new modules (i.e. it
local 'perl5/lib/perl5' directory in the PERL5LIB environment
(a) Installation method: Use the installer
i. Download installer (and name it "installer")
curl -o installer -L
ii. Make it executable
chmod u+x installer
iii. Run it.
(b) Installation method: Use a CPAN client. Here we demonstrate
the use of cpanminus to install it to a local Perl module
directory. These instructions assume absolutely no experience
i. Download cpanminus
curl -LOk http://xrl.us/cpanm
ii. Make it executable
chmod u+x cpanm
iii. Make a local perl5 directory (if it doesn't already exist)
mkdir -p ~/perl5
iv. Add relevant directories to your PERL5LIB and
variables by adding the following text to your ~/.bashrc
# Set PERL5LIB if it doesn't already exist
# Prepend to PERL5LIB if directory not already
found in PERL5LIB
if ! echo $PERL5LIB | egrep -q
# Prepend to PATH if directory not already found in PATH
if ! echo $PATH | egrep -q "(^|:)~/perl5/bin($|:)"; then
v. Update environment variables immediately
vi. Install Module::Build
./cpanm -l ~/perl5 Module::Build
vii. Install Bio::App::SELEX::RNAmotifAnalysis
./cpanm -l ~/perl5 Bio::App::SELEX::RNAmotifAnalysis
Please contact the author if, after consulting this documentation and
searching Google with error messages, you still encounter difficulties
during the installation process using one of these two methods.
BUGS AND LIMITATIONS
There are no known bugs in this module.
Please report problems to molecules <at> cpan <dot> org
Patches are welcome.
Ditzler et. al. Manuscript currently in review.
More information about the Bioperl-l