[Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl

Sean Davis sdavis2 at mail.nih.gov
Mon Apr 16 11:55:14 EDT 2007


> > On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> > > Dear all,
> > >
> > > Given a GO id, is there a way to extract all
> > > the related gene names from that id with Perl?

This is a pretty simple problem if you have the data in a useable format.  The 
data that you need are available here:

ftp://ftp.ncbi.nih.gov/gene/DATA

The README file gives details, but the files in this directory are all 
tab-delimited text.  Download the gene2go.gz file, which contains a mapping 
from Entrez Gene ID to GO accession.  Then, download the gene_info.gz file, 
which contains the information about the Entrez Gene ID, including 
description, gene symbol, etc.  If you need to link to other data, you can of 
course download the respective files from NCBI.  You can either load the data 
into a SQL database of some type for general queries, or you can simply read 
them into perl directly (with appropriate data structures) to do you mapping.  
Since they are tab-delimited text, I would choose the database route and then 
use SQL and DBI to do the queries you like.

Sean


More information about the Bioperl-l mailing list