[Bioperl-guts-l] bioperl-live/Bio/Tools/Blast HSP.pm, 1.23, 1.24 HTML.pm, 1.19, 1.20
Mauricio Herrera Cuadra
mauricio at dev.open-bio.org
Sun Sep 24 11:36:34 EDT 2006
- Previous message: [Bioperl-guts-l] bioperl-network MANIFEST,1.3,1.4
- Next message: [Bioperl-guts-l] bioperl-live/Bio/Tools Blast.pm, 1.37, 1.38 SeqAnal.pm, 1.18, 1.19 SeqPattern.pm, 1.22, 1.23 WWW.pm, 1.17, 1.18
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Update of /home/repository/bioperl/bioperl-live/Bio/Tools/Blast
In directory dev.open-bio.org:/tmp/cvs-serv15026/Bio/Tools/Blast
Modified Files:
HSP.pm HTML.pm
Log Message:
Updating URLs
Index: HTML.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/Tools/Blast/HTML.pm,v
retrieving revision 1.19
retrieving revision 1.20
diff -C2 -d -r1.19 -r1.20
*** HTML.pm 4 Jul 2006 22:23:25 -0000 1.19
--- HTML.pm 24 Sep 2006 15:36:32 -0000 1.20
***************
*** 6,10 ****
# STATUS : Alpha
# REVISION: $Id$
! #
# For the latest version and documentation, visit the distribution site:
# http://bio.perl.org/Projects/Blast/
--- 6,10 ----
# STATUS : Alpha
# REVISION: $Id$
! #
# For the latest version and documentation, visit the distribution site:
# http://bio.perl.org/Projects/Blast/
***************
*** 20,24 ****
#
# Copyright (c) 1996-98 Steve Chervitz. All Rights Reserved.
! # This module is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#-------------------------------------------------------------------------------
--- 20,24 ----
#
# Copyright (c) 1996-98 Steve Chervitz. All Rights Reserved.
! # This module is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#-------------------------------------------------------------------------------
***************
*** 28,32 ****
use Exporter;
! use Bio::Tools::WWW qw(:obj);
use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS
--- 28,32 ----
use Exporter;
! use Bio::Tools::WWW qw(:obj);
use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS
***************
*** 91,95 ****
Swiss-Prot, PIR, PDB, SGD).
! This module is intended for use by Bio::Tools::Blast.pm and related modules,
which provides a front-end to the methods in Bio::Tools::Blast::HTML.pm.
--- 91,95 ----
Swiss-Prot, PIR, PDB, SGD).
! This module is intended for use by Bio::Tools::Blast.pm and related modules,
which provides a front-end to the methods in Bio::Tools::Blast::HTML.pm.
***************
*** 111,121 ****
Bio::Tools::WWW.pm - URL repository.
! http://bio.perl.org/Projects/modules.html - Online module documentation
! http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
http://bio.perl.org/ - Bioperl Project Homepage
=head1 FEEDBACK
! =head2 Mailing Lists
User feedback is an integral part of the evolution of this and other
--- 111,120 ----
Bio::Tools::WWW.pm - URL repository.
! http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
http://bio.perl.org/ - Bioperl Project Homepage
=head1 FEEDBACK
! =head2 Mailing Lists
User feedback is an integral part of the evolution of this and other
***************
*** 132,136 ****
web:
! http://bugzilla.open-bio.org/
=head1 AUTHOR
--- 131,135 ----
web:
! http://bugzilla.open-bio.org/
=head1 AUTHOR
***************
*** 141,145 ****
Copyright (c) 1998-2000 Steve Chervitz. All Rights Reserved.
! This module is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
--- 140,144 ----
Copyright (c) 1998-2000 Steve Chervitz. All Rights Reserved.
! This module is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
***************
*** 175,180 ****
: raw Blast report line-by-line.
: Utility method used by to_html() in Bio::Tools::Blast.pm.
! Returns : Reference to an anonymous function to be used while reading in
! : the raw report.
: The function itself operates on the Blast report line-by-line
: HTML-ifying it and printing it to STDOUT (or saving in the supplied
--- 174,179 ----
: raw Blast report line-by-line.
: Utility method used by to_html() in Bio::Tools::Blast.pm.
! Returns : Reference to an anonymous function to be used while reading in
! : the raw report.
: The function itself operates on the Blast report line-by-line
: HTML-ifying it and printing it to STDOUT (or saving in the supplied
***************
*** 184,190 ****
: If no argument is supplied, HTML output is sent to STDOUT.
Throws : Croaks if an argument is supplied and is not an array ref.
! : The anonymous function returned by this method croaks if
: the Blast output appears to be HTML-formatted already.
! Comments : Adapted from a script by Keith Robison November 1993
: krobison at nucleus.harvard.edu; http://golgi.harvard.edu/gilbert.html
: Modified extensively by Steve Chervitz and Mike Cherry.
--- 183,189 ----
: If no argument is supplied, HTML output is sent to STDOUT.
Throws : Croaks if an argument is supplied and is not an array ref.
! : The anonymous function returned by this method croaks if
: the Blast output appears to be HTML-formatted already.
! Comments : Adapted from a script by Keith Robison November 1993
: krobison at nucleus.harvard.edu; http://golgi.harvard.edu/gilbert.html
: Modified extensively by Steve Chervitz and Mike Cherry.
***************
*** 207,214 ****
my $found_data = 0; # Nothing is done until this is true
my $skip = 0; # Skipping various items in the report header
! my $ref_skip = 0; # so we can include nice HTML versions
! # (e.g., references for the BLAST program).
! my $getNote = 0;
! my $getGenBankAlert = 0;
my $str = '';
my $gi_link = \$_gi_link;
--- 206,213 ----
my $found_data = 0; # Nothing is done until this is true
my $skip = 0; # Skipping various items in the report header
! my $ref_skip = 0; # so we can include nice HTML versions
! # (e.g., references for the BLAST program).
! my $getNote = 0;
! my $getGenBankAlert = 0;
my $str = '';
my $gi_link = \$_gi_link;
***************
*** 234,238 ****
$ref_skip = 0 if /^\s+$/;
}
! if($getNote) {
## SAC: created this test since we are no longer reading from STDIN.
$out_aref ? push(@$out_aref, $_) : print $_;
--- 233,237 ----
$ref_skip = 0 if /^\s+$/;
}
! if($getNote) {
## SAC: created this test since we are no longer reading from STDIN.
$out_aref ? push(@$out_aref, $_) : print $_;
***************
*** 251,255 ****
$prog = $2;
if($prog =~ /BLASTN/) {
! ## Prevent the error at Entrez when you ask for a nucl
## entry with a protein GI number.
$$gi_link = $DbUrl{'gb_n'}; # nucleotide
--- 250,254 ----
$prog = $2;
if($prog =~ /BLASTN/) {
! ## Prevent the error at Entrez when you ask for a nucl
## entry with a protein GI number.
$$gi_link = $DbUrl{'gb_n'}; # nucleotide
***************
*** 268,273 ****
&_markup_database(\$_);
$out_aref ? push(@$out_aref, $_) : print $_;
! if ( /non-redundant genbank/i and $prog =~ /TBLAST[NX]/i) {
! $getGenBankAlert = 1;
}
$skip = 1;
--- 267,272 ----
&_markup_database(\$_);
$out_aref ? push(@$out_aref, $_) : print $_;
! if ( /non-redundant genbank/i and $prog =~ /TBLAST[NX]/i) {
! $getGenBankAlert = 1;
}
$skip = 1;
***************
*** 286,293 ****
$found_table = 1;
$skip = 0;
! $out_aref ? push(@$out_aref, $refs) : print $refs;
! if($getGenBankAlert) {
$str = &_genbank_alert;
! $out_aref ? push(@$out_aref, $str) : print $str;
}
$str = "\n<p><pre>";
--- 285,292 ----
$found_table = 1;
$skip = 0;
! $out_aref ? push(@$out_aref, $refs) : print $refs;
! if($getGenBankAlert) {
$str = &_genbank_alert;
! $out_aref ? push(@$out_aref, $str) : print $str;
}
$str = "\n<p><pre>";
***************
*** 315,319 ****
: to raw Blast output.
Returns : n/a
! Comments : These items need be set only once.
See Also : L<get_html_func()|get_html_func>
--- 314,318 ----
: to raw Blast output.
Returns : n/a
! Comments : These items need be set only once.
See Also : L<get_html_func()|get_html_func>
***************
*** 327,331 ****
%SGDUrl = $BioWWW->sgd_url('all');
! $Signif = '[\de.-]{3,}'; # Regexp for a P-value or Expect value.
$Int = ' *\d\d*'; # Regexp for an integer.
$Descrip = ' +.* {2,}?'; # Regexp for a description line.
--- 326,330 ----
%SGDUrl = $BioWWW->sgd_url('all');
! $Signif = '[\de.-]{3,}'; # Regexp for a P-value or Expect value.
$Int = ' *\d\d*'; # Regexp for an integer.
$Descrip = ' +.* {2,}?'; # Regexp for a description line.
***************
*** 333,337 ****
$Pir_acc = '[A-Z][A-Z0-9]{5,}'; # Regexp for PIR accession number
$Word = '[\w_.]+'; # Regexp for a word. Include dot for version.
!
$_set_markup = 1;
}
--- 332,336 ----
$Pir_acc = '[A-Z][A-Z0-9]{5,}'; # Regexp for PIR accession number
$Word = '[\w_.]+'; # Regexp for a word. Include dot for version.
!
$_set_markup = 1;
}
***************
*** 345,349 ****
Comments : This is used for converting local database IDs into
: understandable terms. At present, it only recognizes
! : databases used locally at SGD.
See Also : L<get_html_func()|get_html_func>
--- 344,348 ----
Comments : This is used for converting local database IDs into
: understandable terms. At present, it only recognizes
! : databases used locally at SGD.
See Also : L<get_html_func()|get_html_func>
***************
*** 378,383 ****
: to accomodate reports produced by other servers/sites.
:
! : This function is simply a collection of substitution regexps
! : that recognize and modify the relevant lines of the Blast report.
: All non-header lines of the report are passed through this function,
: only the ones that match will get modified.
--- 377,382 ----
: to accomodate reports produced by other servers/sites.
:
! : This function is simply a collection of substitution regexps
! : that recognize and modify the relevant lines of the Blast report.
: All non-header lines of the report are passed through this function,
: only the ones that match will get modified.
***************
*** 396,400 ****
: For the alignment sections in the body of the report:
:
! : DB:SEQUENCE_ID (Back | Top) DESCRIPTION
: DB = links to the indicated database (if not Gen/Embl/Ddbj).
: SEQUENCE_ID = links to GenBank entry for the sequence.
--- 395,399 ----
: For the alignment sections in the body of the report:
:
! : DB:SEQUENCE_ID (Back | Top) DESCRIPTION
: DB = links to the indicated database (if not Gen/Embl/Ddbj).
: SEQUENCE_ID = links to GenBank entry for the sequence.
***************
*** 413,417 ****
: Parsing HTML-formatted reports is dependent on the specific structure
: of the HTML and is generally not recommended.
! :
: Note that since URLs can change without notice, links will need updating.
: The links are obtained from Bio::Tools::WWW.pm updating that module
--- 412,416 ----
: Parsing HTML-formatted reports is dependent on the specific structure
: of the HTML and is generally not recommended.
! :
: Note that since URLs can change without notice, links will need updating.
: The links are obtained from Bio::Tools::WWW.pm updating that module
***************
*** 433,448 ****
local $_ = $$line_ref;
##
! ## REGEXPS FOR ALIGNMENT SECTIONS (within the body of the report,
## the text above the list of HSPs).
##
## If the HSP alignment sections don't start with a '>' we have no way
! ## of finding them. This occurs with reports saved from HTML-formatted
## web pages, which we shouldn't be processing here anyway.
## To facilitate parsing of HTML-formatted reports by Bio::Tools::Blast.pm,
! ## the <a name=...> anchors should be added at the BEGINNING of the HSP
## alignment section lines and at the END of the description section lines.
! # Removing " ! " addded by GCG.
s/ ! / /;
--- 432,447 ----
local $_ = $$line_ref;
##
! ## REGEXPS FOR ALIGNMENT SECTIONS (within the body of the report,
## the text above the list of HSPs).
##
## If the HSP alignment sections don't start with a '>' we have no way
! ## of finding them. This occurs with reports saved from HTML-formatted
## web pages, which we shouldn't be processing here anyway.
## To facilitate parsing of HTML-formatted reports by Bio::Tools::Blast.pm,
! ## the <a name=...> anchors should be added at the BEGINNING of the HSP
## alignment section lines and at the END of the description section lines.
! # Removing " ! " addded by GCG.
s/ ! / /;
***************
*** 524,528 ****
##
## Not using bold face to highlight the sequence id's since this can throw off
! ## off formatting of the line when the IDs are different lengths. This lead to
## the scores and P/Expect values not lining up properly.
--- 523,527 ----
##
## Not using bold face to highlight the sequence id's since this can throw off
! ## off formatting of the line when the IDs are different lengths. This lead to
## the scores and P/Expect values not lining up properly.
***************
*** 569,573 ****
## Mike Cherry's markups. SAC note: added back database name to allow
## the HTML-formatted version to be parsable by Blast.pm.
!
s#^ ?(GB_$Word:)($Word)( *)($Acc)($Descrip)($Int) ( *$Signif) ( *\d*)$#GenBank\|<a href="$_gi_link$4">$2</A>\|$4 $3$5$6 <a href="\#$2_$4_A">$7</A> $8<a name="$2_$4_H"></A>#o;
--- 568,572 ----
## Mike Cherry's markups. SAC note: added back database name to allow
## the HTML-formatted version to be parsable by Blast.pm.
!
s#^ ?(GB_$Word:)($Word)( *)($Acc)($Descrip)($Int) ( *$Signif) ( *\d*)$#GenBank\|<a href="$_gi_link$4">$2</A>\|$4 $3$5$6 <a href="\#$2_$4_A">$7</A> $8<a name="$2_$4_H"></A>#o;
***************
*** 594,598 ****
s#^ ?(UTR5_SC_[0-9]*:)(\S*)($Descrip)($Int) ($Signif) ($Int)$#UTR5:$2 $3 $4 <a href="\#$2_A">$5</a> $6<a name="$2_H"></a>#o;
! # Hits without a db identifier.
s@^ ?($Word)($Descrip)($Int) ($Signif)(.*)$@$1$2$3 <A href="\#$1_A">$4</a>$5<a name="$1_H"></a>@o;
--- 593,597 ----
s#^ ?(UTR5_SC_[0-9]*:)(\S*)($Descrip)($Int) ($Signif) ($Int)$#UTR5:$2 $3 $4 <a href="\#$2_A">$5</a> $6<a name="$2_H"></a>#o;
! # Hits without a db identifier.
s@^ ?($Word)($Descrip)($Int) ($Signif)(.*)$@$1$2$3 <A href="\#$1_A">$4</a>$5<a name="$1_H"></a>@o;
***************
*** 619,631 ****
<p>
<small>
! <b>References:</b>
<ol>
! <li>Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990).
Basic local alignment search tool.
<a href="http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=2231712&form=6&db=m&Dopt=r">J. Mol. Biol. 215: 403-10</a>.
! <li>Altschul et al. (1997), Gapped BLAST and PSI-BLAST:
! a new generation of protein database search programs.
<a href="http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9254694&form=6&db=m&Dopt=r">Nucl. Acids Res. 25: 3389-3402</a>.
! <li><b>Program Descriptions</b>:
<a href="http://www.ncbi.nlm.nih.gov/BLAST/newblast.html">BLAST2</a> |
<a href="http://blast.wustl.edu/">WU-BLAST2</a> |
--- 618,630 ----
<p>
<small>
! <b>References:</b>
<ol>
! <li>Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990).
Basic local alignment search tool.
<a href="http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=2231712&form=6&db=m&Dopt=r">J. Mol. Biol. 215: 403-10</a>.
! <li>Altschul et al. (1997), Gapped BLAST and PSI-BLAST:
! a new generation of protein database search programs.
<a href="http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9254694&form=6&db=m&Dopt=r">Nucl. Acids Res. 25: 3389-3402</a>.
! <li><b>Program Descriptions</b>:
<a href="http://www.ncbi.nlm.nih.gov/BLAST/newblast.html">BLAST2</a> |
<a href="http://blast.wustl.edu/">WU-BLAST2</a> |
***************
*** 641,645 ****
# Not really a reference for the Blast algorithm itself but an interesting usage.
! #<li>Gish, Warren, and David J. States (1993). Identification of protein coding regions by database similarity search.
#<a href="http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8485583&form=6&db=m&Dopt=r">Nature Genetics 3:266-72</a>.
--- 640,644 ----
# Not really a reference for the Blast algorithm itself but an interesting usage.
! #<li>Gish, Warren, and David J. States (1993). Identification of protein coding regions by database similarity search.
#<a href="http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8485583&form=6&db=m&Dopt=r">Nature Genetics 3:266-72</a>.
***************
*** 661,670 ****
#------------------
return << "QQ_GENBANK_QQ";
! <p><b><font color="red">CAUTION: Hits reported on this page may be derived from DNA sequences
! that contain more than one gene.
</font>To avoid mis-interpretation, always check database entries
! for any sequence of interest to verify that the similarity
occurs within the described sequence. (E.g., A DNA sequence
! for gene X as reported in GenBank may contain a 5' or 3'
fragment of coding sequence for a neighboring gene Y, yet will
be listed as gene X, since gene Y had not yet been identified). </b>
--- 660,669 ----
#------------------
return << "QQ_GENBANK_QQ";
! <p><b><font color="red">CAUTION: Hits reported on this page may be derived from DNA sequences
! that contain more than one gene.
</font>To avoid mis-interpretation, always check database entries
! for any sequence of interest to verify that the similarity
occurs within the described sequence. (E.g., A DNA sequence
! for gene X as reported in GenBank may contain a 5' or 3'
fragment of coding sequence for a neighboring gene Y, yet will
be listed as gene X, since gene Y had not yet been identified). </b>
***************
*** 687,701 ****
Comments : Based on code originally written by Alex Dong Li
: (ali at genet.sickkids.on.ca).
! : This method does some Blast-specific stripping
! : (adds back a '>' character in front of each HSP
: alignment listing).
! :
: THIS METHOD IS HIGHLY ERROR-PRONE!
:
: Removal of the HTML tags and accurate reconstitution of the
: non-HTML-formatted report is highly dependent on structure of
! : the HTML-formatted version. For example, it assumes that first
: line of each alignment section (HSP listing) starts with a
! : <a name=..> anchor tag. This permits the reconstruction of the
: original report in which these lines begin with a ">".
: This is required for parsing.
--- 686,700 ----
Comments : Based on code originally written by Alex Dong Li
: (ali at genet.sickkids.on.ca).
! : This method does some Blast-specific stripping
! : (adds back a '>' character in front of each HSP
: alignment listing).
! :
: THIS METHOD IS HIGHLY ERROR-PRONE!
:
: Removal of the HTML tags and accurate reconstitution of the
: non-HTML-formatted report is highly dependent on structure of
! : the HTML-formatted version. For example, it assumes that first
: line of each alignment section (HSP listing) starts with a
! : <a name=..> anchor tag. This permits the reconstruction of the
: original report in which these lines begin with a ">".
: This is required for parsing.
***************
*** 719,726 ****
# 2) if a tag is split over multiple lines and this method is
# used to process one line at a time.
!
my $string_ref = shift;
! ref $string_ref eq 'SCALAR' or
croak ("Can't strip HTML: ".
"Argument is should be a SCALAR reference not a ${\ref $string_ref}");
--- 718,725 ----
# 2) if a tag is split over multiple lines and this method is
# used to process one line at a time.
!
my $string_ref = shift;
! ref $string_ref eq 'SCALAR' or
croak ("Can't strip HTML: ".
"Argument is should be a SCALAR reference not a ${\ref $string_ref}");
***************
*** 729,737 ****
my $stripped = 0;
! # Removing "<a name =...>" and adding the '>' character for
# HSP alignment listings.
$str =~ s/(\A|\n)<a name ?=[^>]+> ?/>/sgi and $stripped = 1;
! # Removing all "<>" tags.
$str =~ s/<[^>]+>| //sgi and $stripped = 1;
--- 728,736 ----
my $stripped = 0;
! # Removing "<a name =...>" and adding the '>' character for
# HSP alignment listings.
$str =~ s/(\A|\n)<a name ?=[^>]+> ?/>/sgi and $stripped = 1;
! # Removing all "<>" tags.
$str =~ s/<[^>]+>| //sgi and $stripped = 1;
Index: HSP.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/Tools/Blast/HSP.pm,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** HSP.pm 4 Jul 2006 22:23:25 -0000 1.23
--- HSP.pm 24 Sep 2006 15:36:32 -0000 1.24
***************
*** 13,17 ****
#
# Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved.
! # This module is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#----------------------------------------------------------------------------
--- 13,17 ----
#
# Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved.
! # This module is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
[...1096 lines suppressed...]
*** 1847,1855 ****
Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
Q +APWGLARIS G+ + Y YD+ AG
! ^^^^^^^^^^^^^
! INHERITED DATA MEMBERS
! _name : From Bio::Root::Object.pm.
:
_parent : From Bio::Root::Object.pm. This member contains a reference to the
--- 1846,1854 ----
Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
Q +APWGLARIS G+ + Y YD+ AG
! ^^^^^^^^^^^^^
! INHERITED DATA MEMBERS
! _name : From Bio::Root::Object.pm.
:
_parent : From Bio::Root::Object.pm. This member contains a reference to the
- Previous message: [Bioperl-guts-l] bioperl-network MANIFEST,1.3,1.4
- Next message: [Bioperl-guts-l] bioperl-live/Bio/Tools Blast.pm, 1.37, 1.38 SeqAnal.pm, 1.18, 1.19 SeqPattern.pm, 1.22, 1.23 WWW.pm, 1.17, 1.18
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the Bioperl-guts-l
mailing list