hlapp at gmx.net
Thu Jan 19 18:11:22 EST 2006
I added a couple of capabilities to the scripts/utilities/search2gff
script written by Jason. In a nutshell, there are now options for
controlling the score, location, and method of the HSP-representing
feature, as well as options for printing of parent, which parent, and
whether to skip all except the first HSP for each hit.
As for possible applications, for example using these options you can
blast SNP assay primers and use the options to create SNP features for
a single basepair at the end of the primer, ready to be piped to a
GBrowse GFF3 loader.
I tried to preserve the original functionality in its entirety, i.e.,
if you don't use any of the new options the script should work as
before. If not please let me know.
POD is attached.
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
Usage: search2gff [-o outputfile] [-f reportformat] [-i inputfilename]
OR file1 file2 ..
This script will turn a protein Search report (BLASTP, FASTP, SSEARCH,
AXT, WABA) into a GFF File.
The options are:
-i infilename - (optional) inputfilename, will read
either ARGV files or from STDIN
-o filename - the output filename [default STDOUT]
-f format - search result format (blast, fasta,waba,axt)
(ssearch is fasta format). default is blast.
-t/--type seqtype - if you want to see query or hit information
in the GFF report
-s/--source - specify the source (will be algorithm name
otherwise like BLASTN)
--method - the method tag (primary_tag) of the features
(default is similarity)
--scorefunc - a string or a file that when parsed evaluates
to a closure which will be passed a feature
object and that returns the score to be printed
--locfunc - a string or a file that when parsed evaluates
to a closure which will be passed two
features, query and hit, and returns the
location (Bio::LocationI compliant) for the
GFF3 feature created for each HSP; the closure
may use the clone_loc() and create_loc()
functions for convenience, see their PODs
--onehsp - only print the first HSP feature for each hit
-p/--parent - the parent to which HSP features should refer
if not the name of the hit or query (depending
--target/--notarget - whether to always add the Target tag or not
-h - this help menu
--version - GFF version to use (put a 3 here to use gff 3)
--component - generate GFF component fields (chromosome)
-m/--match - generate a 'match' line which is a container
of all the similarity HSPs
--addid - add ID tag in the absence of --match
-c/--cutoff - specify an evalue cutoff
Additionally specify the filenames you want to process on the
command-line. If no files are specified then STDIN input is assumed. You
specify this by doing: search2gff < file1 file2 file3
Jason Stajich, jason-at-bioperl-dot-org
Hilmar Lapp, hlapp-at-gmx-dot-net
Title : clone_loc
Usage : my $l = clone_loc($feature->location);
Function: Helper function to simplify the task of cloning locations
for --locfunc closures.
Presently simply implemented using Storable::dclone().
Returns : A L<Bio::LocationI> object of the same type and with the
same properties as the argument, but physically different.
All structured properties will be cloned as well.
Args : A L<Bio::LocationI> compliant object
Title : create_loc
Usage : my $l = create_loc("10..12");
Function: Helper function to simplify the task of creating locations
for --locfunc closures. Creates a location from a feature-
table formatted string.
Returns : A L<Bio::LocationI> object representing the location given
as formatted string.
Args : A GenBank feature-table formatted string.
More information about the Bioperl-l