[Bioperl-l] What does Expect(2) mean in a blast result?

Amir Karger akarger at CGR.Harvard.edu
Fri Nov 9 09:53:02 EST 2007


When I tblastn ENSP00000349467 against the human genome, I get a few
hits on chr10, among which are:


 Score =  192 bits (487), Expect(2) = 5e-64
 Identities = 99/109 (90%), Positives = 99/109 (90%)
 Frame = +2

Query: 40
LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99
                L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F
VFDKDGNG
Sbjct: 71593562
LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741

Query: 100      YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148
                YIS  EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA
Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA
71593885



 Score = 75.1 bits (183), Expect(2) = 5e-64
 Identities = 36/43 (83%), Positives = 39/43 (90%)
 Frame = +1

Query: 1        MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43
                MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS  ++
Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575



As you can see from Sbjct lines, these two hits are basically
contiguous.
I was surprised to see that the bit scores and identities and alignment
lengths here are totally different but the expectation values are
identical. 

After a bit of grepping in the BLAST source, I found reference to "sum
segments" and "a collection [of] multiple distinct alignments with
asymmetric gaps between the alignments" and decided it was time to cry
for help. When does BLAST decide that two or more alignments belong
"together" and how does the affect the evalue? Is the evalue really
showing how good those two alignments combined are, despite the frame
shift? (It so happens that that's what I want.)

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

Thanks,

- Amir Karger
Research Computing
Life Sciences Division
Harvard University



More information about the Bioperl-l mailing list