[Bioperl-l] Entrez Gene parser questions

Stefan Kirov skirov at utk.edu
Thu Jun 2 13:39:21 EDT 2005

I am sorry to say there is no good documentation yet, since I am still 
evaluating and debugging the code and I have buch of other stuff to deal 
with ir tight now, so appolozie but there will not be a comprehensive 
docs for at least another 2 weeks.
First install:
I think you need to install Bio::SeqIO::entrezgene.pm and 
Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. 
Make sure you also have Bio::Cluster::SequenceFamily and all modules in 
Bio::SeqFeature::Gene. These also may need updating.
When you do
  my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
                               -file => $file,
                               -debug => 'on');
   my ($gene,$genestructure,$uncaptured) = $seqio->next_seq;
$gene->accession_number will give you the Entrezgene id
and $gene->id will give you the gene symbol. If you go through the 
my $ann=$gene->annotation; (where most of the data is)

my @dblinks= $ann->get_Annotations('DBLink')
foreach my $dblink (@dblinks) {
	print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene');

my @nameann=$ann->get_Annotations('Official Full Name')
print 'Gene name is ',$nameann[0]->as_text,"\n";

my @go=$ann->get_Annotations('OntologyTerm');
foreach my $go (@go) {
next if ($go->authority eq 'STS marker'); #Unless you want STS markers...
my @refs=$go->term->references;#you should get just one
print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline);
my @associated_seq=$struct->get_members();

foreach my $seq (@associated_seq) {
    if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options...
        my @prod=$contig->annotation->get_Annotations('product');
        my @transvar=$contig->annotation->get_Annotations('simple');
        my $transvar='';
        foreach my $sv (@transvar) {
            $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant');
        my $assembly;
        foreach my $ann ($contig->annotation->get_Annotations('dblink')) {
            $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence');
        chop $assembly;
	my $prod;
	foreach my $p (@prod) {
		if ($p) {

Hope this helps and will get you started at least. Let me know if you 
have more question.

Law, Annie wrote:

>I would appreciate help with the following.  First of all, I would like to say it's great that a Entrez gene parser was written. I just have some questions to get me started.  
>A) I already have bioperl 1.4 installed.  I would like to know if I can install the Entrez gene parser and 
>Stefan Kirov's entrezgene.pm with out reinstalling bioperl.  If this is not advised then I would reinstall bioperl.
>I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...)
>		   2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory
>               (which util.pm shold I use??  I searched CPAN and got many results for util.pm
>               is it Biblio::Util, Boulder::Util, or something else??)
>B) For each Entrez gene id how do I access the associated Unigene id, accession numbers, gene symbol, gene name,
>And GO IDs.  
>Also, I have been looking at the code and the POD within it.  Are there some other places that I can look for documentation?
>Thanks so much!!
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org

More information about the Bioperl-l mailing list