[Bioperl-l] Entrez Gene parser questions

Stefan Kirov skirov at utk.edu
Tue Jun 7 16:47:11 EDT 2005

Hi Annie,
Few more things to update:
Bio::AnnotatableI and Bio::SeqFeatureI.  I believe this is the reason 
your script does not work. Update also Bio::SeqIO::entrezegene- I added 
some fixed so you will not see those nasty warning when using strict 
(nothing critical though).
I am not sure we have the same versions for Bio::ASN1::EntrezGene. I 
have 1.0.7. Where did you get it from (maybe mine is older)?

Law, Annie wrote:

>Thanks for all of the replies.
>I am using bioperl 1.4 and I have done the following:
>1. installed  Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, make test, make install)
>2. got a copy of the entrezgene.pm from bioperl-live and put it in the Bio/SeqIO directory
>3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in the corresponding directory
>4. the Bio::Cluster::SequenceFamily file was already up to date
>5. also have all the most recent bioperl-live Bio::SeqFeature::Gene modules
>I grabbed the ASN file from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/
>I then wrote a simple perl script which includes:
>#!/usr/bin/perl -w
>use strict;
>use Bio::ASN1::EntrezGene;
>use Bio::SeqIO;
>use Bio::Annotation::DBLink;
>use Bio::Cluster::SequenceFamily;
>my $file = '/var/lib/mysql/Homo_sapiens';
>my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
>                               -file => $file,
>                               -debug => 'on');
>my ($gene,$genestructure,$uncaptured) = $seqio->next_seq;
>After I run this script I get the following errors:
>Useless use of hash element in void context at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317.
>Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450.
>Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148.
>Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150.
>Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416.
>Can't locate object method "add_tag_value" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234.
>I'm not sure what I am missing?
>Thanks so much,
>-----Original Message-----
>From: Stefan Kirov [mailto:skirov at utk.edu] 
>Sent: Thursday, June 02, 2005 1:39 PM
>To: Law, Annie
>Cc: 'bioperl-l at bioperl.org'
>Subject: Re: [Bioperl-l] Entrez Gene parser questions
>I am sorry to say there is no good documentation yet, since I am still 
>evaluating and debugging the code and I have buch of other stuff to deal 
>with ir tight now, so appolozie but there will not be a comprehensive 
>docs for at least another 2 weeks.
>First install:
>I think you need to install Bio::SeqIO::entrezgene.pm and 
>Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. 
>Make sure you also have Bio::Cluster::SequenceFamily and all modules in 
>Bio::SeqFeature::Gene. These also may need updating.
>When you do
>  my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
>                               -file => $file,
>                               -debug => 'on');
>   my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; $gene->accession_number will give you the Entrezgene id and $gene->id will give you the gene symbol. If you go through the 
>my $ann=$gene->annotation; (where most of the data is)
>my @dblinks= $ann->get_Annotations('DBLink')
>foreach my $dblink (@dblinks) {
>	print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene'); }
>my @nameann=$ann->get_Annotations('Official Full Name')
>print 'Gene name is ',$nameann[0]->as_text,"\n";
>my @go=$ann->get_Annotations('OntologyTerm');
>foreach my $go (@go) {
>next if ($go->authority eq 'STS marker'); #Unless you want STS markers... my @refs=$go->term->references;#you should get just one print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline);
>my @associated_seq=$struct->get_members();
>foreach my $seq (@associated_seq) {
>    if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options...
>        my @prod=$contig->annotation->get_Annotations('product');
>        my @transvar=$contig->annotation->get_Annotations('simple');
>        my $transvar='';
>        foreach my $sv (@transvar) {
>            $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant');
>        }
>        my $assembly;
>        foreach my $ann ($contig->annotation->get_Annotations('dblink')) {
>            $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence');
>        }
>        chop $assembly;
>	my $prod;
>	foreach my $p (@prod) {
>		if ($p) {
>			$prod=$p->value;
>			last;
>		}
>	}
>Hope this helps and will get you started at least. Let me know if you 
>have more question.
>Law, Annie wrote:
>>I would appreciate help with the following.  First of all, I would like 
>>to say it's great that a Entrez gene parser was written. I just have some questions to get me started.
>>A) I already have bioperl 1.4 installed.  I would like to know if I can 
>>install the Entrez gene parser and
>>Stefan Kirov's entrezgene.pm with out reinstalling bioperl.  If this is not advised then I would reinstall bioperl.
>>I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...)
>>		   2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory
>>              (which util.pm shold I use??  I searched CPAN and got many results for util.pm
>>              is it Biblio::Util, Boulder::Util, or something else??)
>>B) For each Entrez gene id how do I access the associated Unigene id, 
>>accession numbers, gene symbol, gene name, And GO IDs.
>>Also, I have been looking at the code and the POD within it.  Are there 
>>some other places that I can look for documentation?
>>Thanks so much!!
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org 

Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

More information about the Bioperl-l mailing list