[Bioperl-l] Setting Theoretical Database size for bl2seq

khoueiry khoueiry at ibdm.univ-mrs.fr
Wed Oct 26 12:44:05 EDT 2005


I made a fast search and found the following:
    I have bl2seq installed on my machine and thus, making a simple "man
bl2seq" gave me an idea about the parameters and it says that -d
correspond to : 
        "-d N (bl2seq)--- Use theoretical DB size of N (zero stands for
the real size)"
a fast search in google gave me a similar result that you can find in
this link ' http://hits.isb-sib.ch/doc/motif_score.shtml'. Briefly, they
say that, when calculating an E-value, specialy when converting from a
normalized score, you have to take the database size in residues.

So, I think that in your case, it will correspond to the "length of
database: 12,254,801,043". 

I hope this is the fine answer, and hope that others will give you more
details if possible. 


On Wed, 2005-10-26 at 11:12 -0400, Waibhav Tembe wrote:

> Hello List,
> This is not a BioPerl question, but I could not find a satisfactory answer
> from other sources and would appreciate any help.
> I am trying to use bl2seq for comparing query "q" and another genome "g".
> Now, for "q" I already have blastall output from an nt database 
> containing >2 million
> sequences. I understand that to get compatible e values, I need to set 
> -d parameter
> for bl2seq to the theoretical data size of that nt database. Which 
> number from
> the following 4 (taken from blastall output) should be used for -d ?
> length of database: 12,254,801,043
> effective length of database: 12,167,805,299
> effective search space: 48671221196
> effective search space used: 48671221196
> Any pointers/website/docs will be appreciated.
> Thank you.
> Tembe
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list