[Bioperl-l] Query gene identifiers from mouse and use homologene to return human orthologues

Fields, Christopher J cjfields at illinois.edu
Tue Oct 11 16:16:00 EDT 2011


Some odd things going on with homologen data, but I wanted to address a few points first.  The code has a number of issues: 

* Missing email (warnings pop up).  This will be required in future releases of Bio::DB::EUtilities, it's already technically required by NCBI
* 'perldoc Bio::Tools::EUtilities::Summary::DocSum' indicates the method name is 'get_contents_by_name' (note lower-case and plural).  As the name implies, it returns a list of raw data
* Missing the elink databases.  The history will capture the original queried database, but not the database linked to; the default on NCBI's end is pubmed when db is not set), which leads to...  
* The last EUtilities call is to 'pubmed', from the history.  If you dump the docsum output (add a 'print $docsum->to_string) this is apparent.  
* Content key is wrong.  Dumping the raw data initially is a good idea for debugging to ensure you are capturing the correct key. In this case, the key is 'TaxId' (note the case).
* Calls to get_Response() are unnecessary; this just returns the HTTP::Response object from the user agent if you need the raw output.  Calls to the eutilities-specific methods will lazily grab incoming data and parse as needed, so get_ids() alone will work as you would expect.

Note the homologene data is hierarchical; there is more than one gene per linked ID, so the item data is nested.  The tools are meant to be fairly generic, so you could just retrieve all the TaxID information, but it might be more relevant to get the GeneIDs to go along with them for more context.  The following works for me.



use strict;
use warnings;
use Bio::DB::EUtilities;
my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                     -db => 'homologene',
                                     -term => 'Copg AND mouse',
                                     -email => <YOUR_EMAIL>,
                                     -usehistory => 'y');
my @h_genes = $eutil->get_ids;
print "@h_genes\n";

my $history = $eutil->next_History || die "esearch failed";
$eutil->reset_parameters(-eutil  => 'elink',
                        -history => $history,
                        -email => <YOUR_EMAIL>,
                        -db      => 'homologene',
                        -cmd     => 'neighbor_history');

$history = $eutil->next_History || die "elink failed";

$eutil->reset_parameters(-eutil => 'esummary',
                        -email => <YOUR_EMAIL>,
                        -history => $history);
while (my $docsum = $eutil->next_DocSum) {
    print "Homologen ID:".$docsum->get_id."\n";
    print "Label: ".($docsum->get_contents_by_name('Caption'))[0]."\n";
    while (my $item = $docsum->next_Item) { # HomoloGeneDataList
        while (my $sub = $item->next_subItem) { # HomoloGeneData
            print "\tGeneId: ".($sub->get_contents_by_name('GeneID'))[0]."\n";
            print "\tTaxId: ".($sub->get_contents_by_name('TaxId'))[0]."\n";

On Oct 11, 2011, at 9:44 AM, Mark Aquino wrote:

> Hi,
> I'm trying to run a perl script that will, as the subject says, convert a list of mouse gene ids to their human orthologues.  However, my code crashes and I've been unable to see if I'm even on the right track in going about doing this so any help would be appreciated.
> Code:
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::EUtilities;
> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                        -db => 'homologene',
>                                        -term => 'Copg AND mouse',
>                                        -usehistory => 'y');
> $esearch->get_Response || die;
> my @h_genes = $esearch->get_ids;
> print "@h_genes\n";
> my $history = $esearch->next_History || die "elink failed";
> my $elink = Bio::DB::EUtilities->new(-eutil => 'elink',
>                                        -history => $history,
>                                        -cmd => 'neighbor_history');
> $elink->get_Response;
> my $hist1 = $elink->next_History;
> my $esum = Bio::DB::EUtilities->new(-eutil => 'esummary',
>                                        -history => $hist1,
>                                        -cookie => $elink->next_cookie);
> $esum->get_Response || die "esum failed";
> while (my $docsum = $esum->next_DocSum){
>        print $docsum->get_id,"\n";
>        print "TaxID: ", $docsum->get_Content_by_name('TaxID'),"\n";
> }
> [According to what I can find, the method get_Content_by_name should work but I am getting the error:
> Throws: Can't locate object method "get_Content_by_name" via package "Bio::Tools::EUtilities::Summary::DocSum" at testeutil.pl line 67.]

More information about the Bioperl-l mailing list