[Bioperl-l] taxonomy ID

shalabh sharma shalabh.sharma7 at gmail.com
Thu Apr 2 15:50:58 EDT 2009

thanks a lot everyone, the information is really useful and it solved my


On Wed, Apr 1, 2009 at 8:00 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Smithies, Russell wrote:
>> The taxonomy information isn't in the blast output unless you created
>> custom fasta headers for your blast database. The easiest way to get
>> the tax_id for your accessions would be to download the gi->tax_id
>> list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. If
>> you load that file into a hash, parse the accessions out of the
>> blast hits then lookup the tax_id from that hash, I think it should
>> be fairly fast.
>> Checking which are prokaryotes and which are eukaryotes based on
>> tax_id is a separate problem  :-) If you grab the taxdump.tar.gz file
>> from the same site, the nodes.dmp file contained within lists what
>> division each tax_id belongs to (Bacteria, Invertebrates, Mammals,
>> Phages, Plants, etc) so you can probably work it out from that.
> Check out the synopsis for Bio::Taxon
> http://doc.bioperl.org/bioperl-live/Bio/Taxon.html
> If the division() function doesn't tell you what you need, you could use
> get_lineage_nodes() and check the oldest ancestors to see if its a pro
> or euk.

More information about the Bioperl-l mailing list