[Bioperl-l] Setting Theoretical Database size for bl2seq

Joseph Bedell jbedell at oriongenomics.com
Wed Oct 26 13:08:54 EDT 2005

Hi Tembe,

>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l-
>bounces at portal.open-bio.org] On Behalf Of Waibhav Tembe
>Sent: Wednesday, October 26, 2005 10:13 AM
>To: bioperl-l
>Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq
>Hello List,
>This is not a BioPerl question, but I could not find a satisfactory
>from other sources and would appreciate any help.
>I am trying to use bl2seq for comparing query "q" and another genome
>Now, for "q" I already have blastall output from an nt database
>containing >2 million
>sequences. I understand that to get compatible e values, I need to set
>-d parameter
>for bl2seq to the theoretical data size of that nt database. Which
>number from
>the following 4 (taken from blastall output) should be used for -d ?
>length of database: 12,254,801,043
>effective length of database: 12,167,805,299
>effective search space: 48671221196
>effective search space used: 48671221196

I believe that you would set -d to "length of database", as opposed to
effective length; however the best bet would be to set -Y to "effective
search space" since Effective length is actually given in the -Y usage
statement. I'm not completely sure if -d is looking for the effective
size or actual size of the database. At the end of the day, the numbers
differ by so little that you probably won't see a true diff in the
e-values between actual vs. effective database size.


Joseph A Bedell, Ph.D.         office: 314-615-6979 
Director, Bioinformatics         fax:    314-615-6975 
Orion Genomics                   cell:   314-518-1343
4041 Forest Park Ave
St. Louis, MO 63108

>Any pointers/website/docs will be appreciated.
>Thank you.
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org

More information about the Bioperl-l mailing list