[Bioperl-l] Taxonomy hierarchy extraction

George Heller george.heller at yahoo.com
Mon Jun 18 21:16:10 EDT 2007


Works perfectly. Thanks so much Jason, Hilmar, Chris. You've been a great help!
   
  Thanks.
  George

Jason Stajich <jason at bioperl.org> wrote:
  The files are indexes because you are indexing a flatfile - this speeds up the lookup so the second time you run the script it doesn't have to index.  You don't need to look at the files, they won't make sense to a human!
  

  The reason it isn't printing anything is someone didn't really write the implementation quite right. This code was overhauled by Sendu before the last release I guess something didn't quite get connected. 
  

  I checked in code that has the Bio::Taxon delegating now to a DB handle for the each_Descendent call.
  You can either patch your code  or just use the code listed here:
     http://bioperl.org/wiki/Module:Bio::DB::Taxonomy

  
    On Jun 18, 2007, at 5:29 PM, George Heller wrote:

    But the problem is that I don't really get any output on the screen. In the /tmp directory I get 4 files namely parents, nodes, id2names and names2id, but I dont know what to make of them. This is what my script looks like,
  

    #!/usr/bin/perl
    use strict;
  #use warnings;
  use DBI;
    use Bio::Tree::Node;
  use Bio::DB::Taxonomy;
  use Bio::DB::Taxonomy::flatfile;
    my $idx_dir = '/tmp';
  my $nodefile;
  my $namesfile;
  

    my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
  my $db = new Bio::DB::Taxonomy(-source    => 'flatfile',
                                 -nodesfile => $nodefile,
                                 -namesfile => $namesfile,
                                 -directory => $idx_dir);
   my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
   my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
  

  for my $child ( @extant_children ) {
    print "id is ", $child->id, "\n"; # NCBI taxa id
    print "rank is ", $child->rank, "\n"; # e.g. species
    print "scientific name is ", $child->scientific_name, "\n"; #
  scientific name
  }
  

  Thanks.
    George
  

  Jason Stajich <jason at bioperl.org> wrote:
      All the children are in this array.  
  

  

    You get to decide what you want to do with them. In the following example I print the id, rank, and scientific name out to the screen.  
    Because this is a taxonomy db query you are getting back Bio::Taxonomy::Taxon objects so read the documentation for this module to see what you can do with the object.
      I would also suggest spending a little time with the Getting started and HOWTO:Trees documentation on the website to get familiar with the objects and nomenclature.
  

  

  

  

    my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
  

  

    for my $child ( @extant_children ) {
        print "id is ", $child->id, "\n"; # NCBI taxa id
      print "rank is ", $child->rank, "\n"; # e.g. species
      print "scientific name is ", $child->scientific_name, "\n"; # scientific name
    }
  

  

      On Jun 18, 2007, at 5:04 PM, George Heller wrote:
  

      Ok, I installed the latest of Scalar::Util and the script seems to be working. But I am confused where exactly I need to look for the descendent taxon ids once the script is run. I did look into the /tmp/ directory, but I couldnt understand much. 
  

  

      Sorry to be bothering, really appreaciate your patience.
  

  

      Thanks.
      George
  

  

    Jason Stajich <jason at bioperl.org> wrote:
      Try installing the latest Scalar::Util  
        On Jun 18, 2007, at 4:05 PM, George Heller wrote:
  

  

        This is the output of /usr/bin/perl -V
  

  

  

  

      Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
        Platform:
          osname=linux, osvers=2.6.9-22.18.bz155725.elsmp, archname=i386-linux-thread-multi
          uname='linux hs20-bc1-4.build.redhat.com 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686 i686 i386 gnulinux '
          config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost -Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
          hint=recommended, useposix=true, d_sigaction=define
          usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
          useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
          use64bitint=undef use64bitall=undef uselongdouble=undef
          usemymalloc=n, bincompat5005=undef
        Compiler:
          cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
          optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
          cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
          ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
          intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
          d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
          ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
          alignbytes=4, prototype=define
        Linker and Libraries:
          ld='gcc', ldflags =' -L/usr/local/lib'
          libpth=/usr/local/lib /lib /usr/lib
          libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
          perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
          libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
          gnulibc_version='2.3.4'
        Dynamic Linking:
          dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
          cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
  

  

  

  

      Characteristics of this binary (from libperl):
        Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
        Built under linux
        Compiled at Jul 24 2006 18:28:10
        @INC:
          /usr/lib/perl5/5.8.5/i386-linux-thread-multi
          /usr/lib/perl5/5.8.5
          /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
          /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
          /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
          /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
          /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
          /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
          /usr/lib/perl5/site_perl/5.8.5
          /usr/lib/perl5/site_perl/5.8.4
          /usr/lib/perl5/site_perl/5.8.3
          /usr/lib/perl5/site_perl/5.8.2
          /usr/lib/perl5/site_perl/5.8.1
          /usr/lib/perl5/site_perl/5.8.0
          /usr/lib/perl5/site_perl
          /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
          /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
          /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
          /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
          /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
          /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
          /usr/lib/perl5/vendor_perl/5.8.5
          /usr/lib/perl5/vendor_perl/5.8.4
          /usr/lib/perl5/vendor_perl/5.8.3
          /usr/lib/perl5/vendor_perl/5.8.2
          /usr/lib/perl5/vendor_perl/5.8.1
          /usr/lib/perl5/vendor_perl/5.8.0
          /usr/lib/perl5/vendor_perl
  

  

  

  

        Thanks.
        George
          .
  

  

  

  

      Hilmar Lapp <hlapp at gmx.net> wrote:
        The perl version appears to be 5.8.5 though, so something strange 
      appears to be going on too.
  

  

  

  

      George, can you please post the output of
  

  

  

  

      $ /usr/bin/perl -V
  

  

  

  

      -hilmar
  

  

  

  

      On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:
  

  

  

  

        As the error implies your local version of perl doesn't seem support
      weak references, which means it doesn't have Scalar::Utils (which was
      added to core after perl 5.6.1, I think). Try installing
      Scalar::Utils to see what happens.
  

  

  

  

      chris
  

  

  

  

      On Jun 18, 2007, at 5:18 PM, George Heller wrote:
  

  

  

  

        I tried running the below mentioned script and I seem to be getting
      the following error:
  

  

  

  

      Weak references are not implemented in the version of perl at /
      usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
      BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/
      Bio/Tree/Node.pm line 76.
      Compilation failed in require at my.pl line 7.
      BEGIN failed--compilation aborted at my.pl line 7.
  

  

  

  

      My script looks something like,
  

  

  

  

      #!/usr/bin/perl
      use strict;
      #use warnings;
      use DBI;
      use Bio::Tree::Node;
      use Bio::DB::Taxonomy;
      use Bio::DB::Taxonomy::flatfile;
      my $idx_dir = '/tmp';
  

  

  

  

      my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
      my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
      -nodesfile => $nodesfile,
      -namesfile => $namesfile,
      -directory => $idx_dir);
      my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
      my @extant_children = grep { $_->is_Leaf } $node-
        get_all_Descendents;
  

  

  

  

      foreach $field (@extant_children) {
      print "$field";
      print "|";
      print "\n";
      }
  

  

  

  

      And I am running the script using the command,
  

  

  

  

      perl myscript.pl -v --names names.dmp --nodes nodes.dmp
  

  

  

  

      and I have the nodes.dmp and names.dmp files in the current
      directory.
  

  

  

  

      Thanks,
      George
  

  

  

  

  

  

  

  

      Jason Stajich wrote:
      It is implemented in the implementing class - DB::Taxonomy is
      just the base class. For example see the flatfile implementation
      Bio::DB::Taxonomy::flatfile
  

  

  

  

      See the scripts/taxa/local_taxonomydb_query.PLS for example using
      it:
      nodes and names are from NCBI taxonomy database.
  

  

  

  

  

  

  

  

      Here is an un-debugged copy+paste for your question that *should*
      work.
  

  

  

  

  

  

  

  

      use Bio::DB::Taxonomy
      my $idx_dir = '/tmp';
  

  

  

  

  

  

  

  

      my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
      my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
      -nodesfile => $nodesfile,
      -namesfile => $namesfile,
      -directory => $idx_dir);
      my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
      my @extant_children = grep { $_->is_Leaf } $node-
        get_all_Descendents;
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      -jason
  

  

  

  

      On Jun 18, 2007, at 10:07 AM, George Heller wrote:
  

  

  

  

      What exactly is the "node n" in the query below. When I issue
      this query, it says,
  

  

  

  

  

  

  

  

      relation "node" does not exist.
  

  

  

  

  

  

  

  

      I tried to use the get_all_Descendents method but it looks like
      in order to do a recursive call it calls the method
      each_Descendent. This method is not implemented in
      Bio::DB::Taxonomy. It just has a single line,
  

  

  

  

  

  

  

  

      shift->throw_not_implemented();
  

  

  

  

  

  

  

  

      Thanks.
      George.
  

  

  

  

  

  

  

  

      Hilmar Lapp wrote:
      I'm a bit confused - it sounds like you have set up a local 
      BioSQL
      database and loaded the NCBI taxonomy into the database. You can 
      now
      use simple SQL to retrieve all descendants of a node in the tree
      given its NCBI taxonID such as
  

  

  

  

  

  

  

  

      SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
      WHERE
      n.ncbi_taxon_id = :taxonID
      AND tn.left_value > n. left_value
      AND tn.right_value < n.right_value
      AND tn.taxon_id = tnm.taxon_id
      AND tn.name_class = 'scientific_name'
  

  

  

  

  

  

  

  

      BioPerl doesn't have a Taxonomy::biosql module yet (though this
      would
      seem like a worthwhile thing to add), so you can't use the
      Bio::DB::Taxonomy interface to do this against a BioSQL instance.
  

  

  

  

  

  

  

  

      However, BioPerl does have support for the flat-file download of 
      the
      NCBI taxonomy database and indexes it, so you can simply use
      Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
      download
      to achieve what you wanted to do in a less than 5 lines of perl.
  

  

  

  

  

  

  

  

      Although the recursive implementation of
      Taxonomy::get_all_Descendants
      () won't be lightning fast, it may still be perfectly fine for your
      application - are you sure it is not?
  

  

  

  

  

  

  

  

      -hilmar
  

  

  

  

  

  

  

  

      On Jun 18, 2007, at 12:21 AM, George Heller wrote:
  

  

  

  

  

  

  

  

      Thanks. And how can I assign the $node here in the below code,
      such
      that I can reference it to a particular taxon id record? I want to
      retrieve all the descendents from the taxonomy hierarchy, given a
      particular taxon id.
  

  

  

  

  

  

  

  

      I have a local db setup, in which I have uploaded data using the
      load_ncbi_taxonomy.pl script.
  

  

  

  

  

  

  

  

      Thanks.
      George
  

  

  

  

  

  

  

  

      Jason Stajich wrote:
      I assume you already figured out how to setup a local taxonomydb?
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      You just want the extant species/leaves of the tree
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      my @extant_children = grep { $_->is_Leaf } $node-
        get_all_Descedents;
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      -jason
      On Jun 17, 2007, at 11:41 AM, George Heller wrote:
  

  

  

  

  

  

  

  

      Hi all,
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      Can anyone point me to some example that uses the
      get_all_Descendents method from Bio::DB::Taxonomy? I am a newbie at
      this, and I am not quite sure how to implement it.
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      Thanks.
      George
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      Sendu Bala wrote:
      George Heller wrote:
      Hi all,
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      I am looking at extracting the taxonomy hierarchy for some taxon
      ids.
      What I plan to do is, for a given taxon id, say 33090, I want to
      extract all taxon ids that are children of this species. I do not
      just want the immediate children, but the children's children 
      and so
      on.
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      Any ideas on the way I can go about doing this?
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      Well, you'll use Bio::DB::Taxonomy presumably, and
      each_Descendent in
      some kind of looping structure. Most easily a recursing sub.
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      If you happen to code up something neat and efficient, why not
      share it
      with us and we could add it to the Taxonomy module(s).
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      ---------------------------------
      Shape Yahoo! in your own image. Join our Network Research Panel
      today!
      _______________________________________________
      Bioperl-l mailing list
      Bioperl-l at lists.open-bio.org
      http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      --
      Jason Stajich
      jason at bioperl.org
      http://jason.open-bio.org/
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      ---------------------------------
      Need a vacation? Get great deals to amazing places on Yahoo! 
      Travel.
      _______________________________________________
      Bioperl-l mailing list
      Bioperl-l at lists.open-bio.org
      http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

  

  

  

  

      --
      ===========================================================
      : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
      ===========================================================
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      ---------------------------------
      Take the Internet to Go: Yahoo!Go puts the Internet in your
      pocket: mail, news, photos & more.
  

  

  

  

  

  

  

  

      --
      Jason Stajich
      jason at bioperl.org
      http://jason.open-bio.org/
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      ---------------------------------
      Bored stiff? Loosen up...
      Download and play hundreds of games for free on Yahoo! Games.
      _______________________________________________
      Bioperl-l mailing list
      Bioperl-l at lists.open-bio.org
      http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

      Christopher Fields
      Postdoctoral Researcher
      Lab of Dr. Robert Switzer
      Dept of Biochemistry
      University of Illinois Urbana-Champaign
  

  

  

  

  

  

  

  

  

  

  

  

      _______________________________________________
      Bioperl-l mailing list
      Bioperl-l at lists.open-bio.org
      http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

      -- 
      ===========================================================
      : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
      ===========================================================
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

      ---------------------------------
      Expecting? Get great news right away with email Auto-Check.
      Try the Yahoo! Mail Beta.
      _______________________________________________
      Bioperl-l mailing list
      Bioperl-l at lists.open-bio.org
      http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

        --
      Jason Stajich
      jason at bioperl.org
      http://jason.open-bio.org/
  

  

  

  

  

  

  

  

  

  

  

  

  

  

    ---------------------------------
    Building a website is a piece of cake. 
    Yahoo! Small Business gives you all the tools to get online.
  

  

      --
    Jason Stajich
    jason at bioperl.org
    http://jason.open-bio.org/
  

  

  

  

  

  

  

  ---------------------------------
  Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.


    --
  Jason Stajich
  jason at bioperl.org
  http://jason.open-bio.org/






 
---------------------------------
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.


More information about the Bioperl-l mailing list