[Bioperl-l] Entrez Gene parser questions

Stefan Kirov skirov at utk.edu
Tue Jun 7 16:47:11 EDT 2005

Hi Annie,
Few more things to update:
Bio::AnnotatableI and Bio::SeqFeatureI.  I believe this is the reason 
your script does not work. Update also Bio::SeqIO::entrezegene- I added 
some fixed so you will not see those nasty warning when using strict 
(nothing critical though).
I am not sure we have the same versions for Bio::ASN1::EntrezGene. I 
have 1.0.7. Where did you get it from (maybe mine is older)?

Law, Annie wrote:

>Thanks for all of the replies.
>I am using bioperl 1.4 and I have done the following:
>1. installed  Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, make test, make install)
>2. got a copy of the entrezgene.pm from bioperl-live and put it in the Bio/SeqIO directory
>3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in the corresponding directory
>4. the Bio::Cluster::SequenceFamily file was already up to date
>5. also have all the most recent bioperl-live Bio::SeqFeature::Gene modules
>I grabbed the ASN file from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/
>I then wrote a simple perl script which includes:
>#!/usr/bin/perl -w
>use strict;
>use Bio::ASN1::EntrezGene;
>use Bio::SeqIO;
>use Bio::Annotation::DBLink;
>use Bio::Cluster::SequenceFamily;
>my $file = '/var/lib/mysql/Homo_sapiens';
>my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
>                               -file => $file,
>                               -debug => 'on');
>my ($gene,$genestructure,$uncaptured) = $seqio->next_seq;
>After I run this script I get the following errors:
>Useless use of hash element in void context at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317.
>Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450.
>Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148.
>Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150.
>Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416.
>Can't locate object method "add_tag_value" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234.
>I'm not sure what I am missing?
>Thanks so much,
>-----Original Message-----
>From: Stefan Kirov [mailto:skirov at utk.edu] 
>Sent: Thursday, June 02, 2005 1:39 PM
>To: Law, Annie
>Cc: 'bioperl-l at bioperl.org'
>Subject: Re: [Bioperl-l] Entrez Gene parser questions
>I am sorry to say there is no good documentation yet, since I am still 
>evaluating and debugging the code and I have buch of other stuff to deal 
>with ir tight now, so appolozie but there will not be a comprehensive 
>docs for at least another 2 weeks.
>First install:
>I think you need to install Bio::SeqIO::entrezgene.pm and 
>Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. 
>Make sure you also have Bio::Cluster::SequenceFamily and all modules in 
>Bio::SeqFeature::Gene. These also may need updating.
>When you do
>  my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
>                               -file => $file,
>                               -debug => 'on');
>   my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; $gene->accession_number will give you the Entrezgene id and $gene->id will give you the gene symbol. If you go through the 
>my $ann=$gene->annotation; (where most of the data is)
>my @dblinks= $ann->get_Annotations('DBLink')
>foreach my $dblink (@dblinks) {
>	print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene'); }
>my @nameann=$ann->get_Annotations('Official Full Name')
>print 'Gene name is ',$nameann[0]->as_text,"\n";
>my @go=$ann->get_Annotations('OntologyTerm');
>foreach my $go (@go) {
>next if ($go->authority eq 'STS marker'); #Unless you want STS markers... my @refs=$go->term->references;#you should get just one print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline);
>my @associated_seq=$struct->get_members();
>foreach my $seq (@associated_seq) {
>    if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options...
>        my @prod=$contig->annotation->get_Annotations('product');
>        my @transvar=$contig->annotation->get_Annotations('simple');
>        my $transvar='';
>        foreach my $sv (@transvar) {
>            $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant');
>        }
>        my $assembly;
>        foreach my $ann ($contig->annotation->get_Annotations('dblink')) {
>            $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence');
>        }
>        chop $assembly;
>	my $prod;
>	foreach my $p (@prod) {
>		if ($p) {
>			$prod=$p->value;
>			last;
>		}
>	}
>Hope this helps and will get you started at least. Let me know if you 
>have more question.
>Law, Annie wrote:
>>I would appreciate help with the following.  First of all, I would like 
>>to say it's great that a Entrez gene parser was written. I just have some questions to get me started.
>>A) I already have bioperl 1.4 installed.  I would like to know if I can 
>>install the Entrez gene parser and
>>Stefan Kirov's entrezgene.pm with out reinstalling bioperl.  If this is not advised then I would reinstall bioperl.
>>I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...)
>>		   2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory
>>              (which util.pm shold I use??  I searched CPAN and got many results for util.pm
>>              is it Biblio::Util, Boulder::Util, or something else??)
>>B) For each Entrez gene id how do I access the associated Unigene id, 
>>accession numbers, gene symbol, gene name, And GO IDs.
>>Also, I have been looking at the code and the POD within it.  Are there 
>>some other places that I can look for documentation?
>>Thanks so much!!
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

