Installing Bioperl on Windows ============================= 1) Quick instructions for the impatient 2) Bioperl 3) Perl on Windows 4) Bioperl on Windows 5) Beyond the Core 6) Bioperl in Cygwin 7) Cygwin tips This installation guide was written by Barry Moore, Nathan Haigh and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioperl mailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ==================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the PPM shell (C:\>ppm). Add two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm Install Bioperl with the following commands: ppm> search Bioperl This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name. ppm> install Where corresponds to the relevant package and version from the numbered list obtained above. Go to http://www.bioperl.org and start reading documentation. 2) Bioperl ========== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in Perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with Bioperl in the native Windows environment. Some external programs such as Staden and the EMBOSS suite of programs can only be installed on Windows by using Cygwin and its gcc C compiler (see Bioperl in Cygwin, below). If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don't mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most Bioperl developers are working in some type of a UNIX environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses - simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of UNIX like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a UNIX emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed in more detail below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you're doing. If that's the case you probably don't need to be reading this guide. Cygwin is a UNIX emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) Bioperl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can't install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl - if you've installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you'll have to install them yourself if you want to use them. Bioperl has such dependencies. Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into six packages. These six packages are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. bioperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl (the core) and the Perl modules that it depends on can be easily installed with PPM. PPM (Programmer's Package Manager formerly known as the Perl Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. The PPM commands shown in this document are for PPM version 3, if you use PPM version 2 the commands you require will be different. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in PPM repositories. ActiveState maintains the largest PPM repository and when you installed ActivePerl PPM was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own PPM repositories to fill in the gaps. Installing will require you to direct PPM to look in three new repositories. You do this by opening a Windows command prompt, typing ppm to start the PPM shell and then typing the following three commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm Once PPM knows where to look for Bioperl and it's dependencies you simply tell PPM to search for packages with Bioperl in their name, and then which of these to install. This is done with the following commands: ppm> search Bioperl This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name. ppm> install Where corresponds to the relevant package and version from the numbered list obtained above. 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no PPM packages for installing these parts of Bioperl (but check this by doing a Bioperl search at the PPM shell): ppm> search bioperl If they are not present, you will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will also want to have a willingness to experiment. You'll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read their documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with others on the bioperl mailing list. 6) Bioperl in Cygwin ==================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. Bioperl v. 1.* runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. A note on Cygwin: it doesn't write to your Registry, it doesn't alter your system or your existing files in any way, it doesn't create partitions, it simply creates a cygwin/ directory and writes all of its files to that directory. To uninstall Cygwin just delete that directory. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN - the same cannot be said of ActiveState's PPM utility. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, binutils, and gcc packages. Clicking the "View" button in the upper right of the installer window enables you to see details on the various packages. Then start up Cygwin and follow the Bioperl installation instructions for Unix in Bioperl's INSTALL file (for example, THE BIOPERL BUNDLE and INSTALLING BIOPERL THE EASY WAY USING CPAN). 7) Cygwin tips ============== If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32-formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. Mysql and DBD::Mysql ==================== You may want to install a relational database in order to use bioperl-db, BioSQL or OBDA. The easiest way to install Mysql is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the Mysql connections to use TCP/IP instead. Do this by using the "-h", or host, option from the command-line. Example: >mysql -h 127.0.0.1 -u -p Alternatively you could install postgres instead of Mysql, postgres is already a package in Cygwin. One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. It may be that these issues have been resolved in versions later than 2.9. Expat ===== Note that expat comes with Cygwin (it's used by the module XML::Parser, which is used by certain Bioperl modules). Directory for temporary files ============================= Set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. E.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not the syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp", but this is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... BLAST ===== If you want use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Compiling C code ================ Although we've recommended using the BLAST and Mysql binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.).