[Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Cui, Wenwu (NIH/NLM/NCBI) [C] cuiw at ncbi.nlm.nih.gov
Thu Feb 1 09:47:38 EST 2007

This is a simple test from gene ID 3632373 (protein is 46100068) to
contig coordinates: 

perl -MLWP::Simple -e 'map {print $_, "\n" if
/<(Gene-source_src.*?>)(.*)?<$1/} (split "\n",

You need to translate protein id to gene id though. 

If the genome is available at Map Viewer, try (the contig name is
NW_101115 from last step)

Wenwu Cui, PhD

-----Original Message-----
From: Rainer Machne [mailto:raim at tbi.univie.ac.at] 
Sent: Wednesday, January 31, 2007 4:10 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Dear Bioperl list,

hoping not be on the wrong email list, i would have a short question:

Is there a standard way or are there nice (Bioperl) tools to come from a

gene id (gi) other ids (see below) to the genomic coordinates of the 
respective gene?

We have Fasta files retrieved from NCBI protein Blast in fungal genomes:

 >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago 
maydis 521]
 >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida

(we only have gi, ref and gb in my set).

I retrieved all my fasta files from whole fungal genomes with available 
protein sequences at

As I only searched whole finished genomes (not shotgun), I thought it 
would then be easy to get the genomic coordinates and retrieve upstream 
sequences, but we have failed so far to find a consistent way to do this

automatically. Many of the gi entries refer to mRNAs or partial mRNAs 
and the way to the coordinates seems to differ for each case.

Any suggestions would be appreciated.

with kind regards,
Rainer Machne

University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org

More information about the Bioperl-l mailing list