[Bioperl-l] Entrez Gene parser questions

Law, Annie Annie.Law at nrc-cnrc.gc.ca
Tue Jun 7 16:25:56 EDT 2005


Thanks for all of the replies.
I am using bioperl 1.4 and I have done the following:
1. installed  Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, make test, make install)
2. got a copy of the entrezgene.pm from bioperl-live and put it in the Bio/SeqIO directory
3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in the corresponding directory
4. the Bio::Cluster::SequenceFamily file was already up to date
5. also have all the most recent bioperl-live Bio::SeqFeature::Gene modules

I grabbed the ASN file from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/

I then wrote a simple perl script which includes:

#!/usr/bin/perl -w

use strict;
use Bio::ASN1::EntrezGene;
use Bio::SeqIO;
use Bio::Annotation::DBLink;
use Bio::Cluster::SequenceFamily;
my $file = '/var/lib/mysql/Homo_sapiens';

my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
                               -file => $file,
                               -debug => 'on');

my ($gene,$genestructure,$uncaptured) = $seqio->next_seq;

After I run this script I get the following errors:

Useless use of hash element in void context at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317.

Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450.

Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148.

Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150.

Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416.

Can't locate object method "add_tag_value" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234.

I'm not sure what I am missing?

Thanks so much,

-----Original Message-----
From: Stefan Kirov [mailto:skirov at utk.edu] 
Sent: Thursday, June 02, 2005 1:39 PM
To: Law, Annie
Cc: 'bioperl-l at bioperl.org'
Subject: Re: [Bioperl-l] Entrez Gene parser questions

I am sorry to say there is no good documentation yet, since I am still 
evaluating and debugging the code and I have buch of other stuff to deal 
with ir tight now, so appolozie but there will not be a comprehensive 
docs for at least another 2 weeks.
First install:
I think you need to install Bio::SeqIO::entrezgene.pm and 
Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. 
Make sure you also have Bio::Cluster::SequenceFamily and all modules in 
Bio::SeqFeature::Gene. These also may need updating.
When you do
  my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
                               -file => $file,
                               -debug => 'on');
   my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; $gene->accession_number will give you the Entrezgene id and $gene->id will give you the gene symbol. If you go through the 
my $ann=$gene->annotation; (where most of the data is)

my @dblinks= $ann->get_Annotations('DBLink')
foreach my $dblink (@dblinks) {
	print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene'); }

my @nameann=$ann->get_Annotations('Official Full Name')
print 'Gene name is ',$nameann[0]->as_text,"\n";

my @go=$ann->get_Annotations('OntologyTerm');
foreach my $go (@go) {
next if ($go->authority eq 'STS marker'); #Unless you want STS markers... my @refs=$go->term->references;#you should get just one print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline);
my @associated_seq=$struct->get_members();

foreach my $seq (@associated_seq) {
    if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options...
        my @prod=$contig->annotation->get_Annotations('product');
        my @transvar=$contig->annotation->get_Annotations('simple');
        my $transvar='';
        foreach my $sv (@transvar) {
            $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant');
        my $assembly;
        foreach my $ann ($contig->annotation->get_Annotations('dblink')) {
            $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence');
        chop $assembly;
	my $prod;
	foreach my $p (@prod) {
		if ($p) {

Hope this helps and will get you started at least. Let me know if you 
have more question.

Law, Annie wrote:

>I would appreciate help with the following.  First of all, I would like 
>to say it's great that a Entrez gene parser was written. I just have some questions to get me started.
>A) I already have bioperl 1.4 installed.  I would like to know if I can 
>install the Entrez gene parser and
>Stefan Kirov's entrezgene.pm with out reinstalling bioperl.  If this is not advised then I would reinstall bioperl.
>I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...)
>		   2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory
>               (which util.pm shold I use??  I searched CPAN and got many results for util.pm
>               is it Biblio::Util, Boulder::Util, or something else??)
>B) For each Entrez gene id how do I access the associated Unigene id, 
>accession numbers, gene symbol, gene name, And GO IDs.
>Also, I have been looking at the code and the POD within it.  Are there 
>some other places that I can look for documentation?
>Thanks so much!!
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org 

More information about the Bioperl-l mailing list