[Bioperl-l] Blast Output and frac_aligned_query
Jason Stajich
jason at cgt.duhs.duke.edu
Mon Jul 19 09:33:35 EDT 2004
On Mon, 19 Jul 2004, James Wasmuth wrote:
> First apologies if this has been debated before, didn;t see it in the
> archive and been away for a while, so unlcear on current state of affairs.
>
> I have a bl2seq output (below) and when I extract its statistics, I am
> told that 156% of the query is aligned.
>
> This is probably because of multiple HSP produced as the protein appears
> highly repetitive. Would this mess up the tiling the hsps, in its
> current implementation?
I guess so. SteveC is the tiling hsp guru so would have to see what he
thinks.
I think a lot of people out there have HSP tiling code - it would be nice
to be able to incorporate more solutions to this problem so that one
could try different strategies...
You might also try using WU-BLAST with -links turned on which provides
consistent groups of HSPs, we haven't (yet) incorporated interpreting the
link information as a way to tile HSPs but would be a good project for
someone to try out. (or for someone to donate if they have already solved
this)
-jason
>
>
> cheers
> -james
>
>
> e = 2e-19
> s = 205
> b = 83.6
> aln_q = 1.56 !
> aln_h = 0.09
> id = 0.208
> cons = 0.256
> len = 332
>
>
>
> > Query= prediction
> > (80 letters)
> >
> > >wormpep
> > Length = 2592
> >
> > Score = 83.6 bits (205), Expect = 2e-19
> > Identities = 41/47 (87%), Positives = 43/47 (91%)
> >
> > Query: 1 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKKNSSSGQ 47
> > SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKK+ + Q
> > Sbjct: 1528 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKETDQAVQ 1574
> >
> >
> >
> > Score = 29.3 bits (64), Expect = 0.004
> > Identities = 13/24 (54%), Positives = 18/24 (75%)
> >
> > Query: 50 SSSGSSSDSSSXDGSTSSDDSXDD 73
> > S S SSSDS S +GS+SS++ D+
> > Sbjct: 493 SGSDSSSDSDSEEGSSSSNEDSDE 516
> >
> >
> >
> > Score = 26.2 bits (56), Expect = 0.036
> > Identities = 13/33 (39%), Positives = 20/33 (60%), Gaps = 1/33 (3%)
> >
> > Query: 40 KKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXD 72
> > ++N++SG DSSS S S+ S + SD+ D
> > Sbjct: 488 QENNASGS-DSSSDSDSEEGSSSSNEDSDEQND 519
> >
> >
> >
> > Score = 23.5 bits (49), Expect = 0.24
> > Identities = 14/69 (20%), Positives = 31/69 (44%), Gaps = 4/69 (5%)
> >
> > Query: 9 NSAADSPMSTTGRPMV----LTKAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGS 64
> > + + SP S+ R + T+++++ + ++ N+S S S S SSS +
> > Sbjct: 454
> > DQGSSSPSSSRDRQNLHDPLQTRSSVEHHTNQEDQENNASGSDSSSDSDSEEGSSSSNED 513
> >
> > Query: 65 TSSDDSXDD 73
> > + + D+
> > Sbjct: 514 SDEQNDVDE 522
> >
> >
> >
> > Score = 21.9 bits (45), Expect = 0.68
> > Identities = 10/29 (34%), Positives = 16/29 (55%)
> >
> > Query: 40 KKNSSSGQHDSSSGSSSDSSSXDGSTSSD 68
> > + +S + + ++ GSSS SSS D D
> > Sbjct: 443 RSSSPTSKSENDQGSSSPSSSRDRQNLHD 471
> >
> >
> >
> > Score = 21.2 bits (43), Expect = 1.2
> > Identities = 13/34 (38%), Positives = 16/34 (47%)
> >
> > Query: 43 SSSGQHDSSSGSSSDSSSXDGSTSSDDSXDDXVP 76
> > S S ++SGS S S + STSS S P
> > Sbjct: 2327 SRSSTMGNNSGSPSASGTTSPSTSSSISSGPDSP 2360
> >
> >
> >
> > Score = 21.2 bits (43), Expect = 1.2
> > Identities = 12/47 (25%), Positives = 17/47 (36%)
> >
> > Query: 27 KAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXDD 73
> > K KA KKK+ D S S+D D S+ + +
> > Sbjct: 1144 KVRKKAEKEKLKKKKHRKGDSSDESDSDSNDELDLDVRKSTKEMTQE 1190
> >
> >
> >
> > Score = 20.0 bits (40), Expect = 2.6
> > Identities = 11/35 (31%), Positives = 18/35 (51%), Gaps = 1/35 (2%)
> >
> > Query: 42 NSSSGQHDSSSGSSS-DSSSXDGSTSSDDSXDDXV 75
> > + SS DS GSSS + S + + ++ +D V
> > Sbjct: 495 SDSSSDSDSEEGSSSSNEDSDEQNDVDEEDDEDVV 529
> >
> >
> >
> > Score = 18.5 bits (36), Expect = 7.6
> > Identities = 7/14 (50%), Positives = 9/14 (64%)
> >
> > Query: 49 DSSSGSSSDSSSXD 62
> > +SS+G SDS D
> > Sbjct: 1252 NSSNGEESDSEKAD 1265
> >
> >
> > Lambda K H
> > 0.294 0.109 0.279
> >
> > Gapped
> > Lambda K H
> > 0.267 0.0410 0.140
> >
> >
> > Matrix: BLOSUM62
> > Gap Penalties: Existence: 11, Extension: 1
> > Number of Hits to DB: 2307
> > Number of Sequences: 0
> > Number of extensions: 39
> > Number of successful extensions: 11
> > Number of sequences better than 10.0: 1
> > Number of HSP's better than 10.0 without gapping: 1
> > Number of HSP's successfully gapped in prelim test: 0
> > Number of HSP's that attempted gapping in prelim test: 0
> > Number of HSP's gapped (non-prelim): 10
> > length of query: 80
> > length of database: 115,000
> > effective HSP length: 56
> > effective length of query: 24
> > effective length of database: 114,944
> > effective search space: 2758656
> > effective search space used: 2758656
> > T: 11
> > A: 40
> > X1: 17 ( 7.2 bits)
> > X2: 38 (14.6 bits)
> > X3: 64 (24.7 bits)
> > S1: 35 (18.0 bits)
> > S2: 35 (18.1 bits)
>
>
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
