[Bioperl-l] how to retrieve organism name from accession number?

Mark A. Jensen maj at fortinbras.us
Wed Jan 27 10:14:22 EST 2010


Precisely the MO behind SoapEU...get the jump on 'em.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
Cc: <bioperl-l at lists.open-bio.org>; "'Mark A. Jensen'" <maj at fortinbras.us>
Sent: Tuesday, January 26, 2010 9:42 PM
Subject: Re: [Bioperl-l] how to retrieve organism name from accession number?


> Makes me wonder if they're pushing more users towards the SOAP-based services 
> and away from eutils.
>
> chris
>
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
>
>> I've had a wide selection of errors lately:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource 
>> temporarily unavailable)
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>> STACK: Bio::Tools::EUtilities::parse_data 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>> STACK: Bio::Tools::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>> STACK: Bio::DB::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>> STACK: get_desc.pl:32
>> -----------------------------------------------------------
>>
>> And I never get a good explanation from NCBI or suggestions on how to avoid 
>> it.
>>
>>
>> --Russell
>>
>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>
>>> It's unfortunate but I have heard this problem popping up quite a bit more
>>> frequently lately.  Not to push too many buttons but NCBI isn't very
>>> forthcoming with help these days; they have become quite insular.  Not
>>> sure if they're short-staffed due to budget or if there are other issues.
>>>
>>> chris
>>>
>>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>>>
>>>> Grrrrrr, I hate eutils!!!!
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>>> (Connection refused)
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::Tools::EUtilities::parse_data
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>>> STACK: Bio::Tools::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>>> STACK: Bio::DB::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>>> STACK: get_desc.pl:32
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> Nice error message though :-)
>>>>
>>>>
>>>> --Russell
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>>> To: 'Chris Fields'
>>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>
>>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>>> often
>>>>> been finding that with large queries, chunks of the resulting data is
>>>>> missing.
>>>>> For example, before Xmas I was creating species-specific databases by
>>>>> using eUtils to get a list of GI numbers back for a taxid, then
>>> retrieving
>>>>> the fasta sequences in chunks of 500.
>>>>> Very regularly, in the middle of the fasta there would be a message
>>> about
>>>>> resource unavailable eg.
>>>>>> test_sequence_1
>>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>>> test_sequence_2
>>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>>>
>>>>> Often this wasn't detected until formatdb complained about invalid
>>>>> characters.
>>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>>> returned stupid answers ("do each sequence manually thru the web
>>>>> interface", or "use eUtils").
>>>>> As we have a nice fast network connection, I now prefer to download
>>> very
>>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>>>
>>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>>> gzipped the output from eUtils queries - it's something I've requested
>>>>> regularly for the last 5 years or so!!
>>>>>
>>>>> --Russell
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>>> To: Smithies, Russell
>>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>
>>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>>> the
>>>>>> details).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>>>
>>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>>> flakiness
>>>>>> lately) would be to download the gi_taxid_nucl.zip or
>>> gi_taxid_prot.zip
>>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>>> and
>>>>>> do lookups.
>>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>>> which
>>>>>> lists taxids and descriptions (and synonyms)
>>>>>>>
>>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>>> could do this:
>>>>>>>
>>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>>> my $org_name = $names{$taxid};
>>>>>>>
>>>>>>> --Russell
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>>> accession
>>>>>>>> number?
>>>>>>>>
>>>>>>>> Bhakti,
>>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>>>
>>>>>>>> use Bio::DB::EUtilities;
>>>>>>>>
>>>>>>>> my (%taxa, @taxa);
>>>>>>>> my (%names, %idmap);
>>>>>>>>
>>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>>> 'nucleotide',
>>>>>>>> # (probably)
>>>>>>>>
>>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>>>
>>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>>                                     -db => 'taxonomy',
>>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>>                                     -correspondence => 1,
>>>>>>>>                                     -id => \@ids);
>>>>>>>>
>>>>>>>> # iterate through the LinkSet objects
>>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>>> }
>>>>>>>>
>>>>>>>> @taxa = @taxa{@ids};
>>>>>>>>
>>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>>      -db    => 'taxonomy',
>>>>>>>>      -id    => \@taxa );
>>>>>>>>
>>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>>> }
>>>>>>>>
>>>>>>>> foreach (@ids) {
>>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>>> }
>>>>>>>>
>>>>>>>> # %idmap is
>>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>>>
>>>>>>>> 1;
>>>>>>>>
>>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>>>
>>>>>>>> sleep 3;
>>>>>>>>
>>>>>>>> or so separating the queries.
>>>>>>>> MAJ
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>>> given
>>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>>> numbers
>>>>>>>>> for which I need to get the source organisms.  Any kind of help
>>> will
>>>>>> be
>>>>>>>>> appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> BD
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>> =======================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>> or
>>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error, please notify
>>> the
>>>>>>> sender immediately.
>>>>>>>
>>>>> =======================================================================
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 



More information about the Bioperl-l mailing list