[Bioperl-l] Genome Information

Chris Fields cjfields at illinois.edu
Tue Oct 26 12:11:05 EDT 2010


I don't know if there is a quick one-step way of getting this information via NCBI w/o wrangling with query term limit magic, and even then you will be bound to whatever version of the genome is present within the database of interest.  

For instance, via eutils you can get summary information for various taxa, genomes, and genome projects using the following example code (prints the first 10 archaeal genome project summaries; set the '-db' parameter to one of 'genomeprj', 'taxonomy', 'genome'):


use Bio::DB::EUtilities;

my $term = "Archaea[ORGN]";

my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                     -db    => 'genome',
                                     -email => 'cjfields at bioperl.org',
                                     -usehistory => 'y',
                                     -term  => $term);

my $hist = $eutil->next_History || die "No history returned";

$eutil->set_parameters(-eutil   => 'esummary',
                       -history => $hist,
                       -retmax  => 10);

$eutil->print_all; # print summary info to STDOUT


GC and coding % don't appear to be stored in any of the above databases, but they are displayed via the genome overview.  You could probably use something like WWW::Mechanize to grab the summary table information displayed using the Genome UID:


Just don't spam the server with a billion requests (use a timeout!) or you'll find yourself blocked.  I may pop an email to NCBI to see if this information is programmatically accessible.


On Oct 26, 2010, at 9:09 AM, shalabh sharma wrote:

> Hi All,
>       I have thousands of taxaIds and i need to find out the following
> information regarding genomes:
> 1) Taxonomy information
> 2) GC%
> 3) total coding genes %
> I can easily find the taxonomy info by using Bio::DB::Taxonomy but for the
> other two i am stuck.
> Is there any way i can find this info?
> I would really appreciate your help.
> Thanks
> Shalabh
> -------------------------------
> Shalabh Sharma
> Scientific Computing Professional Associate (Bioinformatics Specialist)
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list