[Bioperl-l] coloring of HSPs in blast panel

Steve Chervitz sac at bioperl.org
Mon Nov 26 20:41:59 EST 2007


Chris,

Cood catch. You're on track here with one exception: WU blast and NCBI
blast behave differently in what they report in the hit table: WU
blast puts the raw score in the table not the bit score as NCBI blast
does (see example below for reference). WU blast also swaps their
location in the HSP header relative to how NCBI reports it. It would
be good to verify that the blast parser isn't befuddled by this. A
quick look at SearchIO::blast and it appears that data from the hit
table is always getting stored as score, not bits for WU blast. Not
sure if the HSP section data are parsed correctly. I'd recommend
looking into these things when you do your fixes.

So in the end, WU blast HSPs that are built from the hit table should
report a value for raw_score and punt on bits, but NCBI HSPs so
constructed should do the opposite. The downside to this arrangement
is that code that works for NCBI blast hits will need modification to
work for WU blast hits, but that is just the nature of the data. It
shouldn't be an issue for the majority of users that stick with one
flavor of blast and don't switch back and forth, or for users that get
their HSP scoring data from HSP sections rather than relying on the
hit table.

Ideally, the HSP object would know whether it was NCBI or WU-based and
issue an informative warning when attempting to access data it doesn't
have. One solution might be for the parser to put a 'WU-' in front of
the algorithm name for WU blast reports, so it would then be available
for the contained hit/hsp objects. This could break anything dependent
on algorithm name, so it would need some testing.

Steve

Example WU blast table and HSP header:
                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh...  4141  0.0       1
gb|AAC76922.1| (AE000468) aspartokinase II and homoserine...   844  3.1e-86   1
gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi...   483  2.8e-47   1
gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c...    97  0.0010    1

>gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I
            [Escherichia coli]
        Length = 820

 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0
 Identities = 820/820 (100%), Positives = 820/820 (100%)


Example NCBI blast table and HSP header:

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189...
115   8e-26

>ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397
           transcript:ENST00000357569
          Length = 425

 Score =  120 bits (301), Expect = 3e-27
 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%)


On Nov 26, 2007 10:59 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Steve, Bernd, (and Jason, since you may have some input on this as
> well),
>
> I am now looking into the bug Bernd submitted and it seems there is a
> serious discrepancy with the way the hit raw_score, bits, and
> significance is determined for Hit objects.  Unless I am mistaken
> these should always come from the best HSP when they are present,
> falling back to the hit table data only when no HSP alignments are
> present.  Under the latter conditions a minimal Hit object is made
> from data in the hit table, which reports the rounded bit score, not
> the raw score, so in those cases the raw score would be undefined
> (and you probably should get a nasty warning indicating there are no
> HSPs present to get the data from).
>
> What is occurring now, though, is the raw_score and significance is
> explicitly set from the hit table in the BLAST parser for the Hit
> object at all times, while the bits are correctly derived from the
> best HSP (no fallback to the hit table).  Changing to the behavior
> above results in several tests failing via SearchIO.t, with each
> failed test reporting the expected (read:correct) raw score.
>
> I'll look through the tests just in case, but I am planning on
> committing changes to the BLAST parsers, GenericHit, and SearchIO.t
> (to reflect the correct expected data) in the next day or two unless
> there are any objections.
>
> chris
>
>
> On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:
>
> > On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> >> [snip]
> >>
> >> Further is is possible to get the raw_score of a hit. $hit->raw_score
> >> actually gets the bitscore (w/o decimal point).
> >
> > Hmmm. raw_score should not be the same as bit score. So given an
> > example blast hit line such as:
> >
> >        Score = 60.0 bits (30), Expect = 1e-06
> >
> > $hit->raw_score() should return 30, not 60, as you seem to be getting.
> >
> > Could you submit a bug report for this?  http://www.bioperl.org/
> > wiki/Bugs
> >
> > Thanks,
> > Steve
> >
> >>
> >> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> >>> Hi Russell,
> >>>
> >>> I came across your question. At first I thought all was well on my
> >>> system, but indeed I also have these colouring problems.
> >>> I noted that scrore in the bgcolor callback gets a different value!
> >>> Printing score during hit parsing($hit->raw_score) gives the same
> >>> score as -description
> >>> my $score = $feature->score; However, printing score in the bgcolor
> >>> sub gives 2573!
> >>> All scores in the bgcolor routine all different and higher than the
> >>> real scores. Were you able to solve this colouring issue?
> >>>
> >>> Regards,
> >>> Bernd
> >>>
> >>>
> >>>> Hi all,
> >>>> I'm using a modified version of Lincoln's tutorial
> >>>> (http://www.bioperl.org/wiki/
> >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
> >>>> and I'm colouring the HSPs by setting the -bgcolor by score with
> >>>> a sub
> >>>> to give a similar image to that from NCBI but for some reason, my
> >>>> colours are coming out wrong (see attached example)
> >>>> They seem to be off by one but I can't see why.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> I can't be certain but I think it's only started doing this
> >>>> since our
> >>>> BLAST upgrade to 2.2.17 a few weeks ago.
> >>>>
> >>>> Here's the colouring code:
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> -------
> >>>> my $track = $panel->add_track(
> >>>>                               -glyph       => 'segments',
> >>>>                               -label       => 1,
> >>>>                               -connector   => 'dashed',
> >>>>                               -bgcolor     => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 my $score = $feature->score;
> >>>>                         return 'red'       if $score >= 200;
> >>>>                                     return 'fuchsia' if $score
> >>>> >= 80;
> >>>>                                     return 'lime'      if $score
> >>>> >= 50;
> >>>>                         return 'blue'      if $score >= 40;
> >>>>                                     return 'black';
> >>>>                                },
> >>>>                               -font2color  => 'gray',
> >>>>                               -sort_order  => 'high_score',
> >>>>                               -description => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 return unless
> >>>> $feature->has_tag('description');
> >>>>                                 my ($description) =
> >>>> $feature->each_tag_value('description');
> >>>>                                 my $score = $feature->score;
> >>>>                                 "$description, score=$score";
> >>>>                                },
> >>>>                              );
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> ---------
> >>>>
> >>>>
> >>>> Thanx,
> >>>>
> >>>> Russell Smithies
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ===================================================================
> >>>> ====
> >>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>> from AgResearch Limited is intended only for the persons or
> >>>> entities
> >>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>> material. Any review, retransmission, dissemination or other use
> >>>> of, or
> >>>> taking of any action in reliance upon, this information by
> >>>> persons or
> >>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>> Limited. If you have received this message in error, please
> >>>> notify the
> >>>> sender immediately.
> >>>> ===================================================================
> >>>> ====
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


More information about the Bioperl-l mailing list