[Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
raim at tbi.univie.ac.at
Thu Feb 1 07:54:21 EST 2007
Barry and Jason,
thanks for your quick and very helpful replies.
I guess we should have done (or repeat) our blast search at
to get better mapping from proteins to genomes ?
As I retrieved all my proteins via whole genome blasts we should find
(most of) them in the genbank files ... a good opportunity for me to
learn some Bioperl and the other packages you mentioned in case we want
to do more complex analysis later :-)
Thank you very much!
Barry Moore wrote:
> We use a perl library called CGL written by Mark Yandell and colleagues
> (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred
> to by Jason) for this type of task. The basic pipeline is convert
> GenBank files to Chaos XML, then use CGL with those XML files to get a
> nice object oriented access to exons, transcripts, proteins,
> coordinates and more for of those genes. I am currently using this
> with good success on most GenBank genomes (unfortunately I haven't been
> working with the fungal genomes, but it should work fine). The Ensembl
> API provides similar functionality for Ensembl genomes - but not very
> many fungi there.
> Feel free to contact Mark or myself directly if you are interested in
> using CGL.
> On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:
>> Dear Bioperl list,
>> hoping not be on the wrong email list, i would have a short question:
>> Is there a standard way or are there nice (Bioperl) tools to come from a
>> gene id (gi) other ids (see below) to the genomic coordinates of the
>> respective gene?
>> We have Fasta files retrieved from NCBI protein Blast in fungal genomes:
>>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
>> maydis 521]
>>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata]
>> (we only have gi, ref and gb in my set).
>> I retrieved all my fasta files from whole fungal genomes with available
>> protein sequences at
>> As I only searched whole finished genomes (not shotgun), I thought it
>> would then be easy to get the genomic coordinates and retrieve upstream
>> sequences, but we have failed so far to find a consistent way to do this
>> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
>> and the way to the coordinates seems to differ for each case.
>> Any suggestions would be appreciated.
>> with kind regards,
>> Rainer Machne
>> University of Vienna
>> Department for Theoretical Chemistry
>> Theoretical Biochemistry Group
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l