Bioperl scripts

From BioPerl
Jump to: navigation, search

Contents


Introduction

These scripts have been contributed by the developers and users of BioPerl. They are organized into directories roughly mirroring those in the BioPerl Bio/ directory. There are two directories for these scripts, scripts/ and examples/. The scripts in scripts/ are production quality scripts that have POD documentation and accept command-line arguments, and all of these scripts have the PLS suffix. The scripts in examples/ are useful examples of BioPerl code but have been written more casually.

You can install the scripts in the scripts/ directory if you'd like, simply follow the instructions on make install. The installation directory is specified by the INSTALLSCRIPT variable in the Makefile, the default directory is /usr/bin. Installation will copy the scripts to the specified directory, change the .PLS suffix to .pl and prepend bp_ to the script name if it isn't so named already.

Please contact the BioPerl mailing list if you are interested in contributing your own script.

Production Scripts

Installation

scripts/install_bioperl_scripts.pl
This script installs scripts from the scripts/ directory upon make install.

Bio::DB::SeqFeature::Store

scripts/Bio-SeqFeature-Store/bp_seqfeature_gff3.PLS
Dumps output GFF3 for selected database features
scripts/Bio-SeqFeature-Store/bp_seqfeature_load.PLS
This script loads a mySQL Bio::DB::SeqFeature::Store database with the features contained in a list of GFF files.

Bio::DB::GFF

scripts/Bio-DB-GFF/bulk_load_gff.PLS
This script loads a mySQL Bio::DB::GFF database with the features contained in a list of GFF files, it cannot do incremental loads.
scripts/Bio-DB-GFF/genbank2gff.PLS
This script loads a Bio::DB::GFF database with the features contained in a either a local GenBank file or an accession that is fetched from GenBank.
scripts/Bio-DB-GFF/fast_load_gff.PLS
This script does a rapid load of a MySQL Bio::DB::GFF database using files as source. Probably only works in Unix as it relies on pipes.
scripts/Bio-DB-GFF/genbank2gff3.PLS
This script uses Bio::SeqFeature::Tools::Unflattener to convert GenBank flatfiles to GFF3 with gene hierarchies mapped for optimal display in Gbrowse.
scripts/Bio-DB-GFF/generate_histogram.PLS
Create a GFF-formatted histogram of the density of the indicated set of feature types.
scripts/Bio-DB-GFF/load_gff.PLS
This script loads a mySQL Bio::DB::GFF database with the features contained in a list of GFF files. This script will work with all database adaptors supported by Bio::DB::GFF - namely MySQL, Oracle, and PostgreSQL.
scripts/Bio-DB-GFF/pg_bulk_load_gff.PLS
Bulk-load a PostgreSQL Bio::DB::GFF database from GFF files.
scripts/Bio-DB-GFF/process_gadfly.PLS
Transforms Gadfly GFF files into correct format.
scripts/Bio-DB-GFF/process_sgd.PLS
Transform SGD format annotations into GFF format.
scripts/Bio-DB-GFF/process_wormbase.PLS
Transforms Wormbase's GFF files into correct format. Requires Ace.

Bio::Biblio

scripts/biblio/biblio.PLS
A fully-featured script that uses Bio::Biblio, a module for accessing and querying bibliographic repositories like MEDLINE.
scripts/DB/bioflat_index.pl
Create or update a biological sequence database indexed with the Bio::DB::Flat indexing scheme.
scripts/DB/flanks.PLS
Fetch a sequence, find the sequences flanking a variant or SNP in the sequence given its position.
scripts/DB/biofetch_genbank_proxy.PLS
A CGI scripts that queries NCBI's eutils to provide database access according to the BioFetch protocol. Requires Cache::FileCache.
scripts/DB/biogetseq.PLS
Sequence retrieval using the OBDA registry.

DB-HIV

scripts/DB-HIV/hivq.PLS
A command-line interface to the Los Alamos HIV sequence database, based on Bio::DB::HIV and Bio::DB::Query::HIVQuery.

Index

scripts/index/bp_fetch.PLS
Fetch sequences from local indexed database or over the network and reformat using Bio::Index::* and Bio::DB::*.
scripts/index/bp_index.PLS
Indexes local databases, partners with bp_fetch.pl.
scripts/index/bp_seqret.PLS
Index a local Fasta database and fetch sequences using same syntax as EMBOSS seqret tool. Does not support the full EMBOSS db config files, it is designed to support fast fetching of seqs from a FASTA db.

PopGen

scripts/popgen/composite_LD.PLS
An easy way to calculate composite linkage disequilibrium (LD).
scripts/popgen/heterogeneity_test.PLS
A test for distinguishing between selection and population expansion.

SearchIO

scripts/searchio/filter_search.PLS
Simple script to filter by Bio::SearchIO criteria and print.
scripts/searchio/search2table.PLS
Turn Bio::SearchIO reports into a tabular format like blastall's "-m 9" output.
scripts/searchio/fastam9_to_table.PLS
Turn FASTA -m9 reports into a tabular format like blastall's "-m 9" output. Does not actually use Bio::SearchIO so is very fast.
scripts/searchio/hmmer_to_table.PLS
Turn HMMER reports into a tabular format like. Does not actually use Bio::SearchIO so is very fast.
scripts/searchio/parse_hmmsearch.PLS
Parse single or multiple HMMER hmmsearch results file(s) with different output options.

Seq

scripts/seq/extract_feature_seq.PLS
Extract the sequence for a specified feature type.
scripts/seq/make_mrna_protein.PLS
Translate a DNA or RNA sequence to protein using Bio::Seq's translate() method.
scripts/seq/seqconvert.PLS
Bioperl sequence format converter.
scripts/seq/split_seq.PLS
Split a sequence in a file into chunks of equal size with an optional overlapping range.
scripts/seq/translate_seq.PLS
A simple BioPerl translator.
scripts/seq/unflatten_seq.PLS
Unflatten a genbank or genbank-style feature file into a nested Bio::SeqFeatureI hierarchy. Uses Bio::SeqFeature::Tools::Unflattener.

SeqStats

scripts/seqstats/aacomp.PLS
Prints out the count of amino acids over all protein sequences in the input file.
scripts/seqstats/chaos_plot.PLS
Produce a PNG or JPEG chaos plot given a DNA sequence using GD.
scripts/seqstats/bp_gccalc.pl
Prints out the GC content for every nucleotide sequence in the input file.
scripts/seqstats/oligo_count.PLS
Use this script to determine what primers would be useful for frequent priming of nucleic acid for random labeling.

Taxonomy

scripts/taxa/local_taxonomydb_query.PLS
Script that accesses a local taxonomy database and retrieves species or TaxonIDs.
scripts/taxa/query_entrez_taxa.PLS
Demonstrate how to retrieve the NCBI TaxonID for a given species. Also retrieve TaxonID for a given accession number.
scripts/taxa/taxid4species.PLS
Retrieve the NCBI TaxonID for a given species.
scripts/taxa/classify_hits_kingdom.PLS
Classify hits on the taxonomy hierarchy from an -m9/-m8 BLAST tab delimited using a local copy of the taxonomy database downloaded from NCBI.

Tree

scripts/tree/blast2tree.PLS
Builds a phylogenetic tree based on a sequence search (FASTA, BLAST, HMMER).
scripts/tree/nexus2nh.PLS
Convert Nexus tree format trees to New Hampshire tree format, but maintain long taxon names.
scripts/tree/tree2pag.PLS
Convert Bio::TreeIO parseable trees to Pagel tree format.

Utilities

scripts/utilities/bp_mrtrans.PLS
Perl implementation of Bill Pearson's mrtrans to project protein alignment back into cDNA coordinates.
scripts/utilities/bp_nrdb.PLS
Make a non-redundant database based on sequence, not id. Requires Digest::MD5.
scripts/utilities/bp_sreformat.PLS
Perl implementation of Sean Eddy's sreformat, a sequence and alignment converter.
scripts/utilities/dbsplit.PLS
Splits one or more sequence files into subfiles with specified numbers of sequences, any sequence format.
scripts/utilities/download_query_genbank.PLS
Use Bio::DB::Query::GenBank to download files from NCBI.
scripts/utilities/mask_by_search.PLS
Masks parts of a sequence based on a significant matches to that sequence as contained in a Bio::SearchIO-compatible report file.
scripts/utilities/mutate.PLS
Randomly mutagenize a single protein or DNA sequence. Specify percentage mutated and number of resulting mutagenized sequences.
scripts/utilities/pairwise_kaks.PLS
Takes DNA sequences as input, aligns them as proteins, projects the alignment back into DNA and estimates the Ka (non-synonymous) and Ks (synonymous) substitutions.
scripts/utilities/remote_blast.PLS
This script executes a remote BLAST search using Bio::Tools::Run::RemoteBlast.
scripts/utilities/revtrans_motif.PLS
Reverse translate a Profam-like protein motif
scripts/utilities/search2BSML.PLS
Turns SearchIO-compatible reports into a BSML report.
scripts/utilities/search2alnblocks.PLS
Turns SearchIO-compatible reports into alignments in formats supported by Bio::AlignIO.
scripts/utilities/search2tribe.PLS
This script will turn a protein SearchIO-compatible report (BLAST's blastp, FASTA's FASTP and SSEARCH) into a Markov Matrix for TribeMCL clustering.
scripts/utilities/search2gff.PLS
Turn SearchIO parseable report(s) into a GFF report.
scripts/utilities/seq_length.PLS
Reports the total number of residues and total number of individual sequences in a specified sequence database file.

Example Scripts

Alignment

examples/align/align_on_codons.pl
Aligns nucleotide sequences based on codons in a specified reading frame.
examples/align/aligntutorial.pl
Examples using EMBOSS, Bio::Tools::pSW, Clustalw, TCoffee, and BLAST to align sequences.
examples/align/clustalw.pl
A demonstration of the various uses of Bio::Tools::Run::Alignment::Clustalw.
examples/align/simplealign.pl
A script that demonstrates some uses of Bio::AlignIO.
examples/align/FastAlign.pl
A script which uses the heuristic method of tfasty36 and provides a more intuitive output to find exon-intron junctions.

Bio::Biblio

examples/biblio/biblio.pl
A script that shows how to query bibliographic databases (such as MEDLINE) using ids, keywords, and other fields using the Bio::Biblio module.
examples/biblio/biblio_soap.pl
Connect to and test a SOAP server using a Bio::Biblio object.

DB

examples/db/dbfetch
Creates a Web page to query a local SRS server and fetch sequences.
examples/db/est_tissue_query.pl
Fetch EST sequences from local files or GenBank filtered by tissue using Bio::DB::* or Bio::Index::*.
examples/db/gb2features.pl
Shows how to extract all the features from a GenBank file using Bio::Seq.
examples/db/getGenBank.pl
Retrieving GenBank entries over the Web using Bio::DB::GenBank.
examples/db/get_seqs.pl
Fetches and formats sequences from GenBank, EMBL, or SwissProt over the network using Bio::DB*.
examples/db/rfetch.pl
A script that uses Bio::DB::Registry to retrieve sequences from EMBL, reformat them, and print them.
examples/db/use_registry.pl
Script that shows how to use Bio::DB::Registry, part of BioPerl's integration with OBDA, the Open Bio Database Access registry scheme. See Bio::DB::Registry for more information.
examples/db/gff/
Scripts that reformat sequence to GFF and load GFF format files into an indexed database using Bio::DB::GFF.

SearchIO

examples/searchio/blast_example.pl
Print out all parsed values from a BLAST report.
examples/searchio/custom_writer.pl
Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
examples/searchio/hitwriter.pl
Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
examples/searchio/hspwriter.pl
Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
examples/searchio/htmlwriter.pl
Demonstrates how to extract data from BLAST reports and output as HTML.
examples/searchio/psiblast_features.pl
Illustrates how to grab a set of SeqFeatures from a PSI-BLAST (blastpgp) report.
examples/searchio/psiblast_iterations.pl
Demonstrates the use of a SearchIO parser for processing the iterations within a PSI-BLAST report.
examples/searchio/rawwriter.pl
Shows how to print out raw BLAST alignment data for each HSP.
examples/searchio/resultwriter.pl
Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
examples/searchio/waba2gff.pl
Convert raw WABA output to one type of GFF.

Tools

examples/tools/extract_genes.pl
Simple solution to the problem of extracting genomic regions corresponding to genes, uses files from NCBI and Bio::DB::Fasta.
examples/tools/gb_to_gff.pl
Extracts top-level sequence features from GenBank-formatted sequence files using Bio::Tools::GFF.
examples/tools/reverse-translate.pl
Reverse-translates a nucleotide sequence using the most frequent codons, uses Bio::CodonUsage::Table and Bio::Tools::CodonTable.
examples/tools/gff2ps.pl
Takes an input file in GFF format and draws its genes and features as Postscript using Bio::Tools::GFF.
examples/tools/parse_codeml.pl
Script that parses output from codeml, one of the PAML programs, using Bio::Tools::Phylo::PAML.
examples/tools/psw.pl
Example code for using the Ext package for comparing proteins using Smith-Waterman.
examples/tools/run_genscan.pl
Run GENSCAN on multiple sequences and create output sequence files using Bio::Tools::Genscan.
examples/tools/seq_pattern.pl
A script that shows how to use sequences as regular expressions using Bio::Tools::SeqPattern.
examples/tools/standaloneblast.pl
The many uses of Bio::Tools::Run::StandAloneBlast, including BLAST and PSI-BLAST.

Bio::Root

examples/root/exceptions1.pl
A simple tester script for demonstrating how to throw and catch Error objects.
examples/root/exceptions2.pl
This shows how Error.pm-based objects can be thrown by Bio::Root::Root::throw() when Error.pm is available.
examples/root/exceptions3.pl
This shows that Error objects can be subclassed into more specialized types.
examples/root/exceptions4.pl
This shows how the examples work when Error.pm isn't installed.

Other

examples/bioperl.pl
A BioPerl shell!
examples/cluster/dbsnp.pl
How to parse a dbsnp XML file. See Bio::ClusterIO for details.
examples/contributed/nmrpdb_parse.pl
Extracts individual conformers from an NMR-derived PDB file.
examples/contributed/prosite2perl.pl
Convert Prosite motifs to Perl regular expressions.
examples/contributed/rebase2list.pl
Script to convert rebase file to format compatible with Bio::Tools::RestrictionEnzyme.
examples/generate_random_seq.pl
Writes random RNA, DNA, or protein sequence of given length.
examples/liveseq/change_gene.pl
A script showing how to use Bio::LiveSeq::Mutator and Bio::LiveSeq::Mutation.
examples/longorf.pl
A script that finds the longest ORF in one or more nucleotide sequences.
examples/make_primers.pl
Design PCR primers given a sequence and the positions of the start and stop codons in the sequence's ORF.
examples/popgen/parse_calc_stats.pl
Shows how to read data from a Bio::PopGen::IO object.
examples/rev_and_trans.pl
Examples using Bio::Seq for reversing and translating sequences.
examples/revcom_dir.pl
Return reverse complement sequences of all sequences in the current directory and save them in the same directory.
examples/sirna/rnai_finder.cgi
CGI script for RNAi reagent design. See Bio::Tools::SiRNA for more information.
examples/seq/extract_cds.pl
Extract the CDS features from a GenBank file.
examples/seqstats/aacomp.pl
Calculate amino acid composition of a protein using Bio::Tools::IUPAC and Bio::Tools::CodonTable.
examples/structure/struct-io.ps
How to examine details of the 3D structure of a protein by parsing a PDB using Bio::Structure::IO.
examples/subsequence.cgi
CGI script to fetch a sequence from GenBank and extract a subsequence using Bio::DB::GenBank.
examples/tk/gsequence.pl
Create a Protein Sequence Control Panel GUI with Gtk.
examples/tk/hitdisplay.pl
Create a GUI for displaying BLAST results using Bio::Tk::HitDisplay from the GUI package.
examples/tree/paup2phylip.pl
Convert a PAUP tree block to PHYLIP format.
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox