[Bioperl-guts-l] [15708] bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm: expanded prescribed methods to include most of those
Mark Allen Jensen
maj at dev.open-bio.org
Fri May 22 23:51:36 EDT 2009
Revision: 15708
Author: maj
Date: 2009-05-22 23:51:36 -0400 (Fri, 22 May 2009)
Log Message:
-----------
expanded prescribed methods to include most of those
analogous to the HSP::HSPI statistics methods
also expanded the intro POD to put a bunch of
algorithm-specific report information all in
one place, as an aid to devs
Modified Paths:
--------------
bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm
Modified: bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm
===================================================================
--- bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-23 01:02:38 UTC (rev 15707)
+++ bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-23 03:51:36 UTC (rev 15708)
@@ -18,14 +18,39 @@
=head1 SYNOPSIS
-Not used directly.
+Not used directly. Useful POD here for developers, however.
+The interface is desgined to make the following code conversion as
+simple as possible:
+
+From:
+
+ # Bio::Search::SearchUtils-based
+ while ( local $_ = $result->next_hit ) {
+ printf( "E-value: %g; Fraction aligned: %f; Number identical: %d\n",
+ $hit->significance, $hit->frac_aligned_query, $hit->num_identical);
+ }
+
+To:
+
+ # TilingI-based
+ while ( local $_ = $result->next_hit ) {
+ my $tiling = Bio::Search::Tiling::MyTiling($_);
+ printf( "E-value: %g; Fraction aligned: %f; Number identical: %d\n",
+ $hit->significance, $tiling->frac_aligned_query, $tiling->num_identical);
+ }
+
+
+
=head1 DESCRIPTION
This module provides strong suggestions for any intended HSP tiling
object implementation. An object subclassing TilingI should override
the methods defined here according to their descriptions below.
+See the section STATISTICS METHODS for hints on implementing methods
+that are valid across different algorithms and report types.
+
=head1 FEEDBACK
=head2 Mailing Lists
@@ -79,47 +104,65 @@
use base qw(Bio::Root::Root);
-=head2 next_tiling
+=head2 STATISTICS METHODS
- Title : next_tiling
- Usage : @hsps = $self->next_tiling($type);
- Function: Obtain a tiling of HSPs over the $type ('hit', 'subject',
- 'query') sequence
- Example :
- Returns : an array of HSPI objects
- Args : scalar $type: one of 'hit', 'subject', 'query', with
- 'subject' an alias for 'hit'
+The tiling statistics can be thought of as global counterparts to
+similar statistics defined for the individual HSPs. We therefore
+prescribe definitions for many of the synonymous methods defined in
+L<Bio::Search::HSP::HSPI>.
-=cut
+The tiling statistics must be able to keep track of the coordinate
+systems in which both the query and subject sequences exist; i.e.,
+either nucleotide or amino acid. This information is typically
+inferred from the name of the algorithm used to perform the original
+search (contained in C<$hit_object-E<gt>algorithm>). Here is a table
+of algorithm information that may be useful (if you trust us).
-sub next_tiling{
- my ($self,$type, at args) = @_;
- $self->throw_not_implemented;
-}
+ algorithm query on hit coordinates(q/h)
+ --------- ------------ ---------------
+ blastn dna on dna dna/dna
+ blastp aa on aa aa/aa
+ blastx xna on aa dna/aa
+ tblastn aa on xna aa/dna
+ tblastx xna on xna dna/dna
+ fasta dna on dna dna/dna
+ fasta aa on aa aa/aa
+ fastx xna on aa dna/aa
+ fasty xna on aa dna/aa
+ tfasta aa on xna aa/dna
+ tfasty aa on xna aa/dna
+ megablast dna on dna dna/dna
-=head2 rewind_tilings
+ xna: translated nucleotide data
- Title : rewind_tilings
- Usage : $self->rewind_tilings($type)
- Function: Reset the next_tilings($type) iterator
- Example :
- Returns : True on success
- Args : scalar $type: one of 'hit', 'subject', 'query', with
- 'subject' an alias for 'hit'
+Statistics methods must also be aware of differences in reporting
+among the algorithms. Hit attributes are not necessarily normalized
+over all algorithms. Devs, please feel free to add examples to the
+list below.
-=cut
+=over
-sub rewind_tilings{
- my ($self, $type, @args) = @_;
- $self->throw_not_implemented;
-}
+=item NCBI BLAST vs WU-BLAST (AB-BLAST) lengths
-#alias
-sub rewind { shift->rewind_tilings(@_) }
+The total length of the alignment is reported differently between these two flavors. C<$hit_object-E<gt>length()> will contain the number in the denominator of the stats line; i.e., 120 in
+ Identical = 34/120 Positives = 67/120
+
+NCBI BLAST uses the total length of the query sequence as input by the user (a.k.a. "with gaps"). WU-BLAST uses the length of the query sequence actually aligned by the algorithm (a.k.a. "without gaps").
+
+=back
+
+Finally, developers should remember that sequence data may or may not
+be associated with the HSPs contained in the hit object. This will
+typically depend on whether a full report (e.g, C<blastall -m0>) or a
+summary (e.g., C<blastall -m8>) was parsed. Statistics methods that
+depend directly on the sequence data will need to check that
+that data is present.
+
=head2 identities
Title : identities
+ Alias : num_identical
Usage : $num_identities = $tiling->identities()
Function: Return the estimated or exact number of identities in the
tiling, accounting for overlapping HSPs
@@ -134,9 +177,13 @@
$self->throw_not_implemented;
}
+#HSPI synonym
+sub num_identical { shift->identities( @_ ) }
+
=head2 conserved
Title : conserved
+ Alias : num_conserved
Usage : $num_conserved = $tiling->conserved()
Function: Return the estimated or exact number of conserved sites in the
tiling, accounting for overlapping HSPs
@@ -151,14 +198,16 @@
$self->throw_not_implemented;
}
+#HSPI synonym
+sub num_conserved { sub shift->conserved( @_ ) }
+
=head2 length
Title : length
Usage : $max_length = $tiling->length($type)
Function: Return the total number of residues of the subject or query
sequence covered by the tiling
- Example :
- Returns :
+ Returns : number of "raw" residues covered (see logical_length() )
Args : scalar $type, one of 'hit', 'subject', 'query'
=cut
@@ -168,9 +217,172 @@
$self->throw_not_implemented;
}
+=head2 frac_identical
+
+ Title : frac_identical
+ Usage : $tiling->frac_identical($type)
+ Function: Return the fraction of sequence length consisting
+ of identical pairs
+ Returns : scalar float
+ Args : scalar $type, one of 'hit', 'subject', 'query'
+ Note : This method must take account of the $type coordinate
+ system and the length reporting method (see STATISTICS
+ METHODS above)
-#
-# more desired methods here as nec
-#
+=cut
+sub frac_identical {
+ my ($self, $type, @args) = @_;
+ $self->throw_not_implemented;
+}
+
+=head2 percent_identity
+
+ Title : percent_identity
+ Usage : $tiling->percent_identity($type)
+ Function: Return the fraction of sequence length consisting
+ of identical pairs as a percentage
+ Returns : scalar float
+ Args : scalar $type, one of 'hit', 'subject', 'query'
+
+=cut
+
+sub percent_identity {
+ my ($self, $type, @args) = @_;
+ return $self->frac_identical($type, @args) * 100;
+}
+
+=head2 frac_conserved
+
+ Title : frac_conserved
+ Usage : $tiling->frac_conserved($type)
+ Function: Return the fraction of sequence length consisting
+ of conserved pairs
+ Returns : scalar float
+ Args : scalar $type, one of 'hit', 'subject', 'query'
+ Note : This method must take account of the $type coordinate
+ system and the length reporting method (see STATISTICS
+ METHODS above)
+
+=cut
+
+sub frac_conserved{
+ my ($self, $type, @args) = @_;
+ $self->throw_not_implemented;
+}
+
+=head2 percent_conserved
+
+ Title : percent_conserved
+ Usage : $tiling->percent_conserved($type)
+ Function: Return the fraction of sequence length consisting
+ of conserved pairs as a percentage
+ Returns : scalar float
+ Args : scalar $type, one of 'hit', 'subject', 'query'
+
+=cut
+
+sub percent_conserved {
+ my ($self, $type, @args) = @_;
+ return $self->frac_conserved($type, @args) * 100;
+}
+
+
+=head2 frac_aligned
+
+ Title : frac_aligned
+ Usage : $tiling->frac_aligned($type)
+ Function: Return the fraction of B<input> sequence length consisting
+ that was aligned by the algorithm
+ Returns : scalar float
+ Args : scalar $type, one of 'hit', 'subject', 'query'
+ Note : This method must take account of the $type coordinate
+ system and the length reporting method (see STATISTICS
+ METHODS above)
+
+=cut
+
+sub frac_aligned{
+ my ($self, $type, @args) = @_;
+ $self->throw_not_implemented;
+}
+
+# aliases for back compat
+sub frac_aligned_query { shift->frac_aligned('query', @_) }
+sub frac_aligned_hit { shift->frac_aligned('hit', @_) }
+
+=head2 range
+
+ Title : range
+ Usage : $tiling->range($type)
+ Function: Returns the extent of the longest tiling
+ as ($start_coord, $end_coord)
+ Returns : array of two scalar integers
+ Args : scalar $type, one of 'hit', 'subject', 'query'
+
+=cut
+
+sub range {
+ my ($self, $type, @args) = @_;
+ $self->throw_not_implemented;
+}
+
+=head2 logical_length
+
+ Title : logical_length
+ Usage : $tiling->logical_length($type)
+ Function: Get the logical length of the hit sequence,
+ i.e., the length of the pretranslated nucleotide
+ sequence if necessary.
+ Returns : scalar integer
+ Argument: scalar $type, one of 'hit', 'subject', 'query'
+ Comments : This is a key internal function for the frac_* methods.
+
+=cut
+
+sub logical_length{
+ my ($self, $type, @args) = @_;
+ $self->throw_not_implemented;
+}
+
+=head2 TILING ITERATORS
+
+=head2 next_tiling
+
+ Title : next_tiling
+ Usage : @hsps = $self->next_tiling($type);
+ Function: Obtain a tiling of HSPs over the $type ('hit', 'subject',
+ 'query') sequence
+ Example :
+ Returns : an array of HSPI objects
+ Args : scalar $type: one of 'hit', 'subject', 'query', with
+ 'subject' an alias for 'hit'
+
+=cut
+
+sub next_tiling{
+ my ($self,$type, at args) = @_;
+ $self->throw_not_implemented;
+}
+
+=head2 rewind_tilings
+
+ Title : rewind_tilings
+ Usage : $self->rewind_tilings($type)
+ Function: Reset the next_tilings($type) iterator
+ Example :
+ Returns : True on success
+ Args : scalar $type: one of 'hit', 'subject', 'query', with
+ 'subject' an alias for 'hit'
+
+=cut
+
+sub rewind_tilings{
+ my ($self, $type, @args) = @_;
@@ Diff output truncated at 10000 characters. @@
More information about the Bioperl-guts-l
mailing list