[Bioperl-guts-l] bioperl commit

Brian Osborne bosborne at pub.open-bio.org
Sat Aug 28 00:39:02 EDT 2004


bosborne
Sat Aug 28 00:39:02 EDT 2004
Update of /home/repository/bioperl/bioperl-live/doc/howto/sgml
In directory pub.open-bio.org:/tmp/cvs-serv10180/doc/howto/sgml

Modified Files:
	Trees.sgml 
Log Message:
Docbook format corrections

bioperl-live/doc/howto/sgml Trees.sgml,1.7,1.8
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/doc/howto/sgml/Trees.sgml,v
retrieving revision 1.7
retrieving revision 1.8
diff -u -r1.7 -r1.8
--- /home/repository/bioperl/bioperl-live/doc/howto/sgml/Trees.sgml	2004/01/24 15:03:06	1.7
+++ /home/repository/bioperl/bioperl-live/doc/howto/sgml/Trees.sgml	2004/08/28 04:39:02	1.8
@@ -1,35 +1,35 @@
 <!DOCTYPE article  PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
   <article lang="en">
-    <articleinfo>
-      <title>Phylogenetic Tree HOWTO</title>
-      <author>
-	<firstname>Jason</firstname>
-	<surname>Stajich</surname>
-	<authorblurb>
-	  <para>Bioperl Core Developer</para>
-	</authorblurb>
-	<affiliation>
-	  <orgname>Dept Molecular Genetics and Microbiology,
+  <articleinfo>
+    <title>Phylogenetic Tree HOWTO</title>
+    <author>
+      <firstname>Jason</firstname>
+      <surname>Stajich</surname>
+      <authorblurb>
+	<para>Bioperl Core Developer</para>
+      </authorblurb>
+      <affiliation>
+	<orgname>Dept Molecular Genetics and Microbiology,
 	  Duke University</orgname>
-	  <address><email>jason AT bioperl.org</email></address>
-	</affiliation>
-      </author>
-
-      <pubdate>2003-12-01</pubdate>
-      <revhistory>
-	<revision>
-		<revnumber>0.1</revnumber>
-		<date>2003-12-01</date>
-		<authorinitials>JES</authorinitials>
-		<revremark>First version</revremark>
-	</revision>
-       </revhistory>
-      <legalnotice>	
-	<para>This document is copyright Jason Stajich, 2003.  It can
+	<address><email>jason AT bioperl.org</email></address>
+      </affiliation>
+    </author>
+
+    <pubdate>2003-12-01</pubdate>
+    <revhistory>
+      <revision>
+	<revnumber>0.1</revnumber>
+	<date>2003-12-01</date>
+	<authorinitials>JES</authorinitials>
+	<revremark>First version</revremark>
+      </revision>
+    </revhistory>
+    <legalnotice>	
+      <para>This document is copyright Jason Stajich, 2003.  It can
         be copied and distributed under the terms of the Perl
         Artistic License.
-	</para>
-      </legalnotice>
+      </para>
+    </legalnotice>
 
     <abstract>
       <para>
@@ -52,9 +52,9 @@
       and active research.  This HOWTO and the modules described
       within are focused on querying and manipulating trees once they
       have been created.      
-      </para>
+    </para>
 
-      <para>
+    <para>
       The data we intend to capture with these objects concerns the notion
       of Trees and their Nodes.  A Tree is made up of Nodes and the
       relationships which connect these nodes.  The basic
@@ -65,9 +65,8 @@
       need not be a strictly bifurcating tree (or binary trees to the
       CS types), and a parent node can give rise to 1 or many child
       nodes.  
-      </para> 
-      <para>
-
+    </para> 
+    <para>
       In practice there are just a few main
       objects, or modules, you need to know about.  There is the main
       Tree object <classname>Bio::Tree::Tree</classname> which is the main
@@ -87,19 +86,20 @@
       <classname>Bio::Tree::Tree</classname> object is just a 
       container for some summary information about the tree and a
       description of the tree's root node.
-      </para>
-      
-   </section>
-   <section id="simple_usage">
-     <title>Simple Usage</title>
-      <para>
+    </para>  
+  </section>
+
+  <section id="simple_usage">
+    <title>Simple Usage</title>
+    <para>
       Trees are used to represent the history of a collection of taxa,
       sequences, or populations. 
-      </para>
-   </section>
-   <section id="IO">
-   <title>Reading and Writing Trees</title>
-   <para>
+    </para>
+  </section>
+
+  <section id="IO">
+    <title>Reading and Writing Trees</title>
+    <para>
       Using <classname>Bio::TreeIO</classname> one can read trees
       from files or datastreams and
       create <classname>Bio::Tree::Tree</classname> objects.  This is
@@ -109,8 +109,8 @@
       write <classname>Bio::Tree::Tree</classname> objects out to string
       representations like the Newick or New Hampshire format which can
       be printed to a file, a datastream, stored in database, etc.
-   </para>
-   <para>
+    </para>
+    <para>
       The main module for reading and writing trees is the
       <classname>Bio::TreeIO</classname> factory module which has
       several driver modules which plug into it.  These drivers
@@ -124,19 +124,20 @@
       only supports parsing, not writing, of Nexus format tree files.
    </para>
    </section>
-   <section>
-     <title>Example Code</title>
-      <para>
+
+  <section>
+    <title>Example Code</title>
+    <para>
       Here is some code which will read in a Tree from a file called
       "tree.tre" and produce a Bio::Tree::Tree object which is stored in 
       the variable <programlisting>$tree</programlisting>.
-      </para>
-      <para>
+    </para>
+    <para>
       Like most modules which do input/output you can also specify
       the argument -fh in place of -file to provide a glob or filehandle in
       place of the filename.  
-      </para>
-      <para>
+    </para>
+    <para>
       <programlisting>
       use Bio::TreeIO;
       # parse in newick/new hampshire format
@@ -144,114 +145,118 @@
                                   -format => "newick");
       my $tree = $input->next_tree;			       
       </programlisting>
-      </para>
-      <para>
+    </para>
+    <para>
       Once you have a Tree object you can do a number of things with it.
       These are all methods required in <classname>Bio::Tree::TreeI</classname>.
-      </para>
-      </section>
+    </para>
+  </section>
 
-      <section id="TreeI">
-       <title>Bio::Tree::TreeI methods</title>
-        <para>
-	Request the taxa (leaves of the tree).
-	<programlisting>my @taxa = $tree->get_leaf_nodes;</programlisting>
-	</para>
-
-	<para>
-	Get the root node.	
-	<programlisting>my $root = $tree->get_root_node; </programlisting>
-	</para>
-	<para>
-	Get the total length of the tree (sum of all the branch lengths),
-	which is only useful if the nodes actually have the branch
-	length stored, of course.
-	<programlisting>my $total_length = $tree->total_branch_length;</programlisting>
-	</para>
-	
-   </section>
+  <section id="TreeI">
+    <title>Bio::Tree::TreeI methods</title>
+    <para>
+      Request the taxa (leaves of the tree).
+      <programlisting>my @taxa = $tree->get_leaf_nodes;</programlisting>
+    </para>
+
+    <para>
+      Get the root node.	
+      <programlisting>my $root = $tree->get_root_node; </programlisting>
+    </para>
+    <para>
+      Get the total length of the tree (sum of all the branch lengths),
+      which is only useful if the nodes actually have the branch
+      length stored, of course.
+      <programlisting>
+my $total_length = $tree->total_branch_length;
+      </programlisting>
+    </para>
+  </section>
    
-   <section id="TreeFunctionsI">
+  <section id="TreeFunctionsI">
     <title>Bio::Tree::TreeFunctionsI</title>
-     <para>
-     An additional interface was written which implements 
-     utility functions which are useful for manipulating a Tree.
-     </para>
+    <para>
+      An additional interface was written which implements 
+      utility functions which are useful for manipulating a Tree.
+    </para>
      
-     <para>
-     Find a particular node, either by name or by some other field that is
-     stored in a Node.  The field type should be the function name we
-     can call on all of the Nodes in the Tree.
-     <programlisting>
+    <para>
+      Find a particular node, either by name or by some other field that is
+      stored in a Node.  The field type should be the function name we
+      can call on all of the Nodes in the Tree.
+      <programlisting>
    # find all the nodes named 'node1' (there should be only one) 
    my @nodes = $tree->find_node(-id => 'node1');
    # find all the nodes which have description 'BMP'
    my @nodes = $tree->find_node(-description => 'BMP');
    # find all the nodes with bootstrap value of 70
    my @nodes = $tree->find_node(-bootstrap => 70);
-   </programlisting>
-   
-     If you would like to do more sophisticated searches, like "find all
+      </programlisting>
+    </para>
+    <para>
+      If you would like to do more sophisticated searches, like "find all
      the nodes with bootstrap values better than 70", you can easily
      implement this yourself.
-     <programlisting>
+      <programlisting>
      my @nodes = grep { $_->bootstrap > 70 } $tree->get_nodes;
-     </programlisting>
-
-     Remove a Node from the Tree and update the children/ancestor links
-     where the Node is an intervening one.
-     <programlisting>
+      </programlisting>
+      Remove a Node from the Tree and update the children/ancestor links
+      where the Node is an intervening one.
+    </para>
+    <para>
+      <programlisting>
    # provide the node object to remove from the Tree
    $tree->remove_Node($node);
    # or specify the node Name to remove
    $tree->remove_Node('Node12');
-   </programlisting>
- 
-    Get the lowest common ancestor for a set of Nodes.  This method is
-    used to find an internal Node of the Tree which can be traced,
-    through its children, to the requested set of Nodes. It is used in
-    the calculations of monophyly and paraphyly and in determining the
-    distance between two nodes.
- 
-   <programlisting>
+      </programlisting>
+    </para>
+    <para>
+      Get the lowest common ancestor for a set of Nodes.  This method is
+      used to find an internal Node of the Tree which can be traced,
+      through its children, to the requested set of Nodes. It is used in
+      the calculations of monophyly and paraphyly and in determining the
+      distance between two nodes.
+      <programlisting>
    # Provide a list of Nodes that are in this tree
    my $lca = $tree->get_lca(-nodes => \@nodes);
-   </programlisting>
-
-   Get the distance between two nodes by adding up the branch lengths
-   of all the connecting edges between two nodes.
-
-   <programlisting>
+      </programlisting>
+    </para>
+    <para>
+      Get the distance between two nodes by adding up the branch lengths
+      of all the connecting edges between two nodes.
+      <programlisting>
    my $distances = $tree->distance(-nodes => [$node1,$node2]);
-   </programlisting>
-
+      </programlisting>
+    </para>
+    <para>
    Perform a test of monophyly for a set of nodes and a given outgroup
    node.  This means the common ancestor for the members of the
    internal_nodes group is more recent than the common ancestor that any of them
    share with the outgroup node.
-
    <programlisting>
    if( $tree->is_monophyletic(-nodes    => \@internal_nodes,
                               -outgroup => $outgroup) ) {
      print "these nodes are monophyletic: ",
           join(",",map { $_->id } @internal_nodes ), "\n";
    }
-   </programlisting>
-   
-   Perform a test of paraphyly for a set of nodes and a given outgroup
-   node.  This means that a common ancestor 'A' for the members of the
-   ingroup is more recent than a common ancestor 'B' that they share with
-   the outgroup node <emphasis>and</emphasis> that there are no other 
-   nodes in the tree which have 'A' as a common ancestor before 'B'.
-   
-   <programlisting>
+      </programlisting>
+    </para>
+    <para>
+      Perform a test of paraphyly for a set of nodes and a given outgroup
+      node.  This means that a common ancestor 'A' for the members of the
+      ingroup is more recent than a common ancestor 'B' that they share with
+      the outgroup node <emphasis>and</emphasis> that there are no other 
+      nodes in the tree which have 'A' as a common ancestor before 'B'.
+      <programlisting>
    if( $tree->is_paraphyletic(-nodes    => \@internal_nodes,
                               -outgroup => $outgroup) > 0 ) {
      print "these nodes are monophyletic: ",
           join(",",map { $_->id } @internal_nodes ), "\n";
    }   
-   </programlisting>   
-
+      </programlisting>   
+    </para>
+    <para>
    Reroot a tree, specifying a different node as the root (and a
    different node as the outgroup).
    <programlisting>
@@ -260,57 +265,54 @@
    # or it can be an internal node which will become the new
    # root of the Tree
    $tree->reroot($node);
-   </programlisting>
-   </para>
+      </programlisting>
+    </para>
    </section>
-   <section id="tree_building">
-   <title>Constructing Trees</title>
-   <para>
-
-    Pairwise distances for all sequences in an alignment can be
-    computed with <classname>Bio::Align::DNAStatistics</classname> and
-    (for DNA)
-    and <classname>Bio::Align::ProteinStatistics</classname>.  There
-    are several different methods implemented.  For DNA alignments,
-    Jukes-Cantor (1969), Juke-Cantor uncorrected, Kimura 2-parameter
-    (1980), Felsenstein (1981), Tajima-Nei (1984), and Tamura (1992)
-    are currently implemented.  In addition, for coding sequences,
-    synonymous and non-synonymous counts can be computed with
-    the <function>calc_KaKs_pair</function>.  For Protein sequences
-    alignments only Kimura (1983) is currently supported although
-    other methods will be added. 
-   </para>
-   <para>
-
-   To use these methods simply initialize a statistics module, and
-   pass in an alignment object
-   (<classname>Bio::SimpleAlign</classname>) and the type of distance
-   method to use and the module will return
-   a <classname>Bio::Matrix::PhylipDist</classname> matrix object of
-   pairwise distances.  The code example below shows how this should
-   be done.
-   </para>
-
-   <para>
 
-   Given the matrix of pairwise distances one can build a phylogenetic
-   tree using 2 simple methods provided in
-   the <classname>Bio::Tree::DistanceFactory</classname>.  Simple
-   request either Neighbor-Joining (NJ) trees or Unweighted Pair Group
-   Method with Arithmatic Mean (UPGMA) clusters.  There are caveats
-   with these methods and whether or not the distances are additive.
-   The method <function>check_additivity</function> 
-   in <classname>Bio::Tree::DistanceFactory</classname>
-   is provided to calculate whether or not additivity holds for the
-   data.
-
-   </para>
-   <para> 
-   The following is a basic code snippet which describes how to use
-   the pairwise distance and tree building modules in Bioperl. 
-   </para>
-   <para>
-   <programlisting>
+  <section id="tree_building">
+    <title>Constructing Trees</title>
+    <para>
+      Pairwise distances for all sequences in an alignment can be
+      computed with <classname>Bio::Align::DNAStatistics</classname> and
+      (for DNA)
+      and <classname>Bio::Align::ProteinStatistics</classname>.  There
+      are several different methods implemented.  For DNA alignments,
+      Jukes-Cantor (1969), Juke-Cantor uncorrected, Kimura 2-parameter
+      (1980), Felsenstein (1981), Tajima-Nei (1984), and Tamura (1992)
+      are currently implemented.  In addition, for coding sequences,
+      synonymous and non-synonymous counts can be computed with
+      the <function>calc_KaKs_pair</function>.  For Protein sequences
+      alignments only Kimura (1983) is currently supported although
+      other methods will be added. 
+    </para>
+    <para>
+      To use these methods simply initialize a statistics module, and
+      pass in an alignment object
+      (<classname>Bio::SimpleAlign</classname>) and the type of distance
+      method to use and the module will return
+      a <classname>Bio::Matrix::PhylipDist</classname> matrix object of
+      pairwise distances.  The code example below shows how this should
+      be done.
+    </para>
+    
+    <para>
+      Given the matrix of pairwise distances one can build a phylogenetic
+      tree using 2 simple methods provided in
+      the <classname>Bio::Tree::DistanceFactory</classname>.  Simple
+      request either Neighbor-Joining (NJ) trees or Unweighted Pair Group
+      Method with Arithmatic Mean (UPGMA) clusters.  There are caveats
+      with these methods and whether or not the distances are additive.
+      The method <function>check_additivity</function> 
+      in <classname>Bio::Tree::DistanceFactory</classname>
+      is provided to calculate whether or not additivity holds for the
+      data.
+    </para>
+    <para> 
+      The following is a basic code snippet which describes how to use
+      the pairwise distance and tree building modules in Bioperl. 
+    </para>
+    <para>
+      <programlisting>
    use Bio::AlignIO;
    use Bio::Align::DNAStatistics;
    use Bio::Tree::DistanceFactory;
@@ -322,45 +324,44 @@
 			      -align  => $aln);
    my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
    my $tree = $dfactory->make_tree($mat);
-   </programlisting>
-   </para>
-   <para>
-   TODO: Using external programs: phylip,MrBayes,paup,puzzle,protml
-   </para>
-   <para>
-
-   Non-parametric bootstrapping is one method to test the consistency
-   of the data with the optimal tree.  A set of subreplicates are
-   generated from the alignment using the method
-   from <classname>Bio::Align::Utilities</classname> called
-   <function>bootstrap_replicates</function>.  One passes in an
-   alignment object and the count of the number of replicates to generate.
-
-   </para>
-   <para>
-   <programlisting>
+      </programlisting>
+    </para>
+    <para>
+      TODO: Using external programs: phylip,MrBayes,paup,puzzle,protml
+    </para>
+    <para>
+      Non-parametric bootstrapping is one method to test the consistency
+      of the data with the optimal tree.  A set of subreplicates are
+      generated from the alignment using the method
+      from <classname>Bio::Align::Utilities</classname> called
+      <function>bootstrap_replicates</function>.  One passes in an
+      alignment object and the count of the number of replicates to generate.
+    </para>
+    <para>
+      <programlisting>
    use Bio::Align::Utilities qw(:all); 
    my $replicates = bootstrap_replicates($aln,$count);
-   </programlisting>
-   </para>
-   </section>
-   <section id="advanced_topics">
-   <title>Advanced Topics</title>
-   <para>
-   It is possible to generate random tree topologies with a Bioperl
-   object called <classname>Bio::Tree::RandomFactory</classname>.  The
-   factory only requires the specification of the total number of taxa
-   in order to simulate a history.  One can request different methods for
-   generating the random phylogeny. At present, however, only the
-   simple Yule backward is implemented and is the default. 
-   </para>
-   <para>
-   The trees can be generated with the following code.  You can either
-   specify the names of taxa or just a count of total number of taxa
-   in the simulation.
-   </para>
-   <para>
-   <programlisting>
+      </programlisting>
+    </para>
+  </section>
+  
+  <section id="advanced_topics">
+    <title>Advanced Topics</title>
+    <para>
+      It is possible to generate random tree topologies with a Bioperl
+      object called <classname>Bio::Tree::RandomFactory</classname>.  The
+      factory only requires the specification of the total number of taxa
+      in order to simulate a history.  One can request different methods for
+      generating the random phylogeny. At present, however, only the
+      simple Yule backward is implemented and is the default. 
+    </para>
+    <para>
+      The trees can be generated with the following code.  You can either
+      specify the names of taxa or just a count of total number of taxa
+      in the simulation.
+    </para>
+    <para>
+      <programlisting>
    use Bio::TreeIO;
    use Bio::Tree::RandomFactory;
    # initialize a TreeIO writer to output the trees as we create them
@@ -382,39 +383,39 @@
    while( my $tree = $factory->next_tree) {
      $out->write_tree($tree);
    }
-   </programlisting>
-   </para>			   	      
-   <para>
-   There are more sophisticated operations that you may wish to pursue
-   with these objects.  We have tried to create a framework for this type
-   of data, but by no means should this be looked at as the final
-   product.  If you have a particular statistic or function that
-   applies to trees that you would like to see included in the
-   toolkit we encourage you to send details to the Bioperl list.
-   </para>
-   
-   </section>
-   <section id="References">
-   <!-- do bibliography here -->
-   <title>References and More Reading</title>
-   <para>
-   For more reading and some references for the techniques above see
-   these titles.   
-   </para>   
-   <para>
-   <simplelist>
-   <member>
-   J. Felsenstein, "Infering Phylogenies" 2003. Sinuar and Associates.
-   </member>
-   <member>
-   D. Swoffrod, Olsen, Waddell and D. Hillis, "Phylogenetic Inference"
-   1996. in Mol. Systematics, 2nd ed, 1996, Ch 11.
-   </member>
-   <member>
-   Eddy SR, Durbin R, Krogh A, Mitchison G, "Biological Sequence
-   Analysis" 1998. Cambridge Univ Press, Cambridge, UK.
-   </member>
-   </simplelist>
-   </para>
+      </programlisting>
+    </para>			   	      
+    <para>
+      There are more sophisticated operations that you may wish to pursue
+      with these objects.  We have tried to create a framework for this type
+      of data, but by no means should this be looked at as the final
+      product.  If you have a particular statistic or function that
+      applies to trees that you would like to see included in the
+      toolkit we encourage you to send details to the Bioperl list.
+    </para>
    </section>
+
+  <section id="References">
+    <!-- do bibliography here -->
+    <title>References and More Reading</title>
+    <para>
+      For more reading and some references for the techniques above see
+      these titles.   
+    </para>   
+    <para>
+      <simplelist>
+	<member>
+	  J. Felsenstein, "Infering Phylogenies" 2003. Sinuar and Associates.
+	</member>
+	<member>
+	  D. Swoffrod, Olsen, Waddell and D. Hillis, "Phylogenetic Inference"
+	  1996. in Mol. Systematics, 2nd ed, 1996, Ch 11.
+	</member>
+	<member>
+	  Eddy SR, Durbin R, Krogh A, Mitchison G, "Biological Sequence
+	  Analysis" 1998. Cambridge Univ Press, Cambridge, UK.
+	</member>
+      </simplelist>
+    </para>
+  </section>
  </article> 



More information about the Bioperl-guts-l mailing list