[Bioperl-guts-l] bioperl commit
Brian Osborne
bosborne at pub.open-bio.org
Sat Aug 28 00:39:02 EDT 2004
bosborne
Sat Aug 28 00:39:02 EDT 2004
Update of /home/repository/bioperl/bioperl-live/doc/howto/sgml
In directory pub.open-bio.org:/tmp/cvs-serv10180/doc/howto/sgml
Modified Files:
Trees.sgml
Log Message:
Docbook format corrections
bioperl-live/doc/howto/sgml Trees.sgml,1.7,1.8
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/doc/howto/sgml/Trees.sgml,v
retrieving revision 1.7
retrieving revision 1.8
diff -u -r1.7 -r1.8
--- /home/repository/bioperl/bioperl-live/doc/howto/sgml/Trees.sgml 2004/01/24 15:03:06 1.7
+++ /home/repository/bioperl/bioperl-live/doc/howto/sgml/Trees.sgml 2004/08/28 04:39:02 1.8
@@ -1,35 +1,35 @@
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<article lang="en">
- <articleinfo>
- <title>Phylogenetic Tree HOWTO</title>
- <author>
- <firstname>Jason</firstname>
- <surname>Stajich</surname>
- <authorblurb>
- <para>Bioperl Core Developer</para>
- </authorblurb>
- <affiliation>
- <orgname>Dept Molecular Genetics and Microbiology,
+ <articleinfo>
+ <title>Phylogenetic Tree HOWTO</title>
+ <author>
+ <firstname>Jason</firstname>
+ <surname>Stajich</surname>
+ <authorblurb>
+ <para>Bioperl Core Developer</para>
+ </authorblurb>
+ <affiliation>
+ <orgname>Dept Molecular Genetics and Microbiology,
Duke University</orgname>
- <address><email>jason AT bioperl.org</email></address>
- </affiliation>
- </author>
-
- <pubdate>2003-12-01</pubdate>
- <revhistory>
- <revision>
- <revnumber>0.1</revnumber>
- <date>2003-12-01</date>
- <authorinitials>JES</authorinitials>
- <revremark>First version</revremark>
- </revision>
- </revhistory>
- <legalnotice>
- <para>This document is copyright Jason Stajich, 2003. It can
+ <address><email>jason AT bioperl.org</email></address>
+ </affiliation>
+ </author>
+
+ <pubdate>2003-12-01</pubdate>
+ <revhistory>
+ <revision>
+ <revnumber>0.1</revnumber>
+ <date>2003-12-01</date>
+ <authorinitials>JES</authorinitials>
+ <revremark>First version</revremark>
+ </revision>
+ </revhistory>
+ <legalnotice>
+ <para>This document is copyright Jason Stajich, 2003. It can
be copied and distributed under the terms of the Perl
Artistic License.
- </para>
- </legalnotice>
+ </para>
+ </legalnotice>
<abstract>
<para>
@@ -52,9 +52,9 @@
and active research. This HOWTO and the modules described
within are focused on querying and manipulating trees once they
have been created.
- </para>
+ </para>
- <para>
+ <para>
The data we intend to capture with these objects concerns the notion
of Trees and their Nodes. A Tree is made up of Nodes and the
relationships which connect these nodes. The basic
@@ -65,9 +65,8 @@
need not be a strictly bifurcating tree (or binary trees to the
CS types), and a parent node can give rise to 1 or many child
nodes.
- </para>
- <para>
-
+ </para>
+ <para>
In practice there are just a few main
objects, or modules, you need to know about. There is the main
Tree object <classname>Bio::Tree::Tree</classname> which is the main
@@ -87,19 +86,20 @@
<classname>Bio::Tree::Tree</classname> object is just a
container for some summary information about the tree and a
description of the tree's root node.
- </para>
-
- </section>
- <section id="simple_usage">
- <title>Simple Usage</title>
- <para>
+ </para>
+ </section>
+
+ <section id="simple_usage">
+ <title>Simple Usage</title>
+ <para>
Trees are used to represent the history of a collection of taxa,
sequences, or populations.
- </para>
- </section>
- <section id="IO">
- <title>Reading and Writing Trees</title>
- <para>
+ </para>
+ </section>
+
+ <section id="IO">
+ <title>Reading and Writing Trees</title>
+ <para>
Using <classname>Bio::TreeIO</classname> one can read trees
from files or datastreams and
create <classname>Bio::Tree::Tree</classname> objects. This is
@@ -109,8 +109,8 @@
write <classname>Bio::Tree::Tree</classname> objects out to string
representations like the Newick or New Hampshire format which can
be printed to a file, a datastream, stored in database, etc.
- </para>
- <para>
+ </para>
+ <para>
The main module for reading and writing trees is the
<classname>Bio::TreeIO</classname> factory module which has
several driver modules which plug into it. These drivers
@@ -124,19 +124,20 @@
only supports parsing, not writing, of Nexus format tree files.
</para>
</section>
- <section>
- <title>Example Code</title>
- <para>
+
+ <section>
+ <title>Example Code</title>
+ <para>
Here is some code which will read in a Tree from a file called
"tree.tre" and produce a Bio::Tree::Tree object which is stored in
the variable <programlisting>$tree</programlisting>.
- </para>
- <para>
+ </para>
+ <para>
Like most modules which do input/output you can also specify
the argument -fh in place of -file to provide a glob or filehandle in
place of the filename.
- </para>
- <para>
+ </para>
+ <para>
<programlisting>
use Bio::TreeIO;
# parse in newick/new hampshire format
@@ -144,114 +145,118 @@
-format => "newick");
my $tree = $input->next_tree;
</programlisting>
- </para>
- <para>
+ </para>
+ <para>
Once you have a Tree object you can do a number of things with it.
These are all methods required in <classname>Bio::Tree::TreeI</classname>.
- </para>
- </section>
+ </para>
+ </section>
- <section id="TreeI">
- <title>Bio::Tree::TreeI methods</title>
- <para>
- Request the taxa (leaves of the tree).
- <programlisting>my @taxa = $tree->get_leaf_nodes;</programlisting>
- </para>
-
- <para>
- Get the root node.
- <programlisting>my $root = $tree->get_root_node; </programlisting>
- </para>
- <para>
- Get the total length of the tree (sum of all the branch lengths),
- which is only useful if the nodes actually have the branch
- length stored, of course.
- <programlisting>my $total_length = $tree->total_branch_length;</programlisting>
- </para>
-
- </section>
+ <section id="TreeI">
+ <title>Bio::Tree::TreeI methods</title>
+ <para>
+ Request the taxa (leaves of the tree).
+ <programlisting>my @taxa = $tree->get_leaf_nodes;</programlisting>
+ </para>
+
+ <para>
+ Get the root node.
+ <programlisting>my $root = $tree->get_root_node; </programlisting>
+ </para>
+ <para>
+ Get the total length of the tree (sum of all the branch lengths),
+ which is only useful if the nodes actually have the branch
+ length stored, of course.
+ <programlisting>
+my $total_length = $tree->total_branch_length;
+ </programlisting>
+ </para>
+ </section>
- <section id="TreeFunctionsI">
+ <section id="TreeFunctionsI">
<title>Bio::Tree::TreeFunctionsI</title>
- <para>
- An additional interface was written which implements
- utility functions which are useful for manipulating a Tree.
- </para>
+ <para>
+ An additional interface was written which implements
+ utility functions which are useful for manipulating a Tree.
+ </para>
- <para>
- Find a particular node, either by name or by some other field that is
- stored in a Node. The field type should be the function name we
- can call on all of the Nodes in the Tree.
- <programlisting>
+ <para>
+ Find a particular node, either by name or by some other field that is
+ stored in a Node. The field type should be the function name we
+ can call on all of the Nodes in the Tree.
+ <programlisting>
# find all the nodes named 'node1' (there should be only one)
my @nodes = $tree->find_node(-id => 'node1');
# find all the nodes which have description 'BMP'
my @nodes = $tree->find_node(-description => 'BMP');
# find all the nodes with bootstrap value of 70
my @nodes = $tree->find_node(-bootstrap => 70);
- </programlisting>
-
- If you would like to do more sophisticated searches, like "find all
+ </programlisting>
+ </para>
+ <para>
+ If you would like to do more sophisticated searches, like "find all
the nodes with bootstrap values better than 70", you can easily
implement this yourself.
- <programlisting>
+ <programlisting>
my @nodes = grep { $_->bootstrap > 70 } $tree->get_nodes;
- </programlisting>
-
- Remove a Node from the Tree and update the children/ancestor links
- where the Node is an intervening one.
- <programlisting>
+ </programlisting>
+ Remove a Node from the Tree and update the children/ancestor links
+ where the Node is an intervening one.
+ </para>
+ <para>
+ <programlisting>
# provide the node object to remove from the Tree
$tree->remove_Node($node);
# or specify the node Name to remove
$tree->remove_Node('Node12');
- </programlisting>
-
- Get the lowest common ancestor for a set of Nodes. This method is
- used to find an internal Node of the Tree which can be traced,
- through its children, to the requested set of Nodes. It is used in
- the calculations of monophyly and paraphyly and in determining the
- distance between two nodes.
-
- <programlisting>
+ </programlisting>
+ </para>
+ <para>
+ Get the lowest common ancestor for a set of Nodes. This method is
+ used to find an internal Node of the Tree which can be traced,
+ through its children, to the requested set of Nodes. It is used in
+ the calculations of monophyly and paraphyly and in determining the
+ distance between two nodes.
+ <programlisting>
# Provide a list of Nodes that are in this tree
my $lca = $tree->get_lca(-nodes => \@nodes);
- </programlisting>
-
- Get the distance between two nodes by adding up the branch lengths
- of all the connecting edges between two nodes.
-
- <programlisting>
+ </programlisting>
+ </para>
+ <para>
+ Get the distance between two nodes by adding up the branch lengths
+ of all the connecting edges between two nodes.
+ <programlisting>
my $distances = $tree->distance(-nodes => [$node1,$node2]);
- </programlisting>
-
+ </programlisting>
+ </para>
+ <para>
Perform a test of monophyly for a set of nodes and a given outgroup
node. This means the common ancestor for the members of the
internal_nodes group is more recent than the common ancestor that any of them
share with the outgroup node.
-
<programlisting>
if( $tree->is_monophyletic(-nodes => \@internal_nodes,
-outgroup => $outgroup) ) {
print "these nodes are monophyletic: ",
join(",",map { $_->id } @internal_nodes ), "\n";
}
- </programlisting>
-
- Perform a test of paraphyly for a set of nodes and a given outgroup
- node. This means that a common ancestor 'A' for the members of the
- ingroup is more recent than a common ancestor 'B' that they share with
- the outgroup node <emphasis>and</emphasis> that there are no other
- nodes in the tree which have 'A' as a common ancestor before 'B'.
-
- <programlisting>
+ </programlisting>
+ </para>
+ <para>
+ Perform a test of paraphyly for a set of nodes and a given outgroup
+ node. This means that a common ancestor 'A' for the members of the
+ ingroup is more recent than a common ancestor 'B' that they share with
+ the outgroup node <emphasis>and</emphasis> that there are no other
+ nodes in the tree which have 'A' as a common ancestor before 'B'.
+ <programlisting>
if( $tree->is_paraphyletic(-nodes => \@internal_nodes,
-outgroup => $outgroup) > 0 ) {
print "these nodes are monophyletic: ",
join(",",map { $_->id } @internal_nodes ), "\n";
}
- </programlisting>
-
+ </programlisting>
+ </para>
+ <para>
Reroot a tree, specifying a different node as the root (and a
different node as the outgroup).
<programlisting>
@@ -260,57 +265,54 @@
# or it can be an internal node which will become the new
# root of the Tree
$tree->reroot($node);
- </programlisting>
- </para>
+ </programlisting>
+ </para>
</section>
- <section id="tree_building">
- <title>Constructing Trees</title>
- <para>
-
- Pairwise distances for all sequences in an alignment can be
- computed with <classname>Bio::Align::DNAStatistics</classname> and
- (for DNA)
- and <classname>Bio::Align::ProteinStatistics</classname>. There
- are several different methods implemented. For DNA alignments,
- Jukes-Cantor (1969), Juke-Cantor uncorrected, Kimura 2-parameter
- (1980), Felsenstein (1981), Tajima-Nei (1984), and Tamura (1992)
- are currently implemented. In addition, for coding sequences,
- synonymous and non-synonymous counts can be computed with
- the <function>calc_KaKs_pair</function>. For Protein sequences
- alignments only Kimura (1983) is currently supported although
- other methods will be added.
- </para>
- <para>
-
- To use these methods simply initialize a statistics module, and
- pass in an alignment object
- (<classname>Bio::SimpleAlign</classname>) and the type of distance
- method to use and the module will return
- a <classname>Bio::Matrix::PhylipDist</classname> matrix object of
- pairwise distances. The code example below shows how this should
- be done.
- </para>
-
- <para>
- Given the matrix of pairwise distances one can build a phylogenetic
- tree using 2 simple methods provided in
- the <classname>Bio::Tree::DistanceFactory</classname>. Simple
- request either Neighbor-Joining (NJ) trees or Unweighted Pair Group
- Method with Arithmatic Mean (UPGMA) clusters. There are caveats
- with these methods and whether or not the distances are additive.
- The method <function>check_additivity</function>
- in <classname>Bio::Tree::DistanceFactory</classname>
- is provided to calculate whether or not additivity holds for the
- data.
-
- </para>
- <para>
- The following is a basic code snippet which describes how to use
- the pairwise distance and tree building modules in Bioperl.
- </para>
- <para>
- <programlisting>
+ <section id="tree_building">
+ <title>Constructing Trees</title>
+ <para>
+ Pairwise distances for all sequences in an alignment can be
+ computed with <classname>Bio::Align::DNAStatistics</classname> and
+ (for DNA)
+ and <classname>Bio::Align::ProteinStatistics</classname>. There
+ are several different methods implemented. For DNA alignments,
+ Jukes-Cantor (1969), Juke-Cantor uncorrected, Kimura 2-parameter
+ (1980), Felsenstein (1981), Tajima-Nei (1984), and Tamura (1992)
+ are currently implemented. In addition, for coding sequences,
+ synonymous and non-synonymous counts can be computed with
+ the <function>calc_KaKs_pair</function>. For Protein sequences
+ alignments only Kimura (1983) is currently supported although
+ other methods will be added.
+ </para>
+ <para>
+ To use these methods simply initialize a statistics module, and
+ pass in an alignment object
+ (<classname>Bio::SimpleAlign</classname>) and the type of distance
+ method to use and the module will return
+ a <classname>Bio::Matrix::PhylipDist</classname> matrix object of
+ pairwise distances. The code example below shows how this should
+ be done.
+ </para>
+
+ <para>
+ Given the matrix of pairwise distances one can build a phylogenetic
+ tree using 2 simple methods provided in
+ the <classname>Bio::Tree::DistanceFactory</classname>. Simple
+ request either Neighbor-Joining (NJ) trees or Unweighted Pair Group
+ Method with Arithmatic Mean (UPGMA) clusters. There are caveats
+ with these methods and whether or not the distances are additive.
+ The method <function>check_additivity</function>
+ in <classname>Bio::Tree::DistanceFactory</classname>
+ is provided to calculate whether or not additivity holds for the
+ data.
+ </para>
+ <para>
+ The following is a basic code snippet which describes how to use
+ the pairwise distance and tree building modules in Bioperl.
+ </para>
+ <para>
+ <programlisting>
use Bio::AlignIO;
use Bio::Align::DNAStatistics;
use Bio::Tree::DistanceFactory;
@@ -322,45 +324,44 @@
-align => $aln);
my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
my $tree = $dfactory->make_tree($mat);
- </programlisting>
- </para>
- <para>
- TODO: Using external programs: phylip,MrBayes,paup,puzzle,protml
- </para>
- <para>
-
- Non-parametric bootstrapping is one method to test the consistency
- of the data with the optimal tree. A set of subreplicates are
- generated from the alignment using the method
- from <classname>Bio::Align::Utilities</classname> called
- <function>bootstrap_replicates</function>. One passes in an
- alignment object and the count of the number of replicates to generate.
-
- </para>
- <para>
- <programlisting>
+ </programlisting>
+ </para>
+ <para>
+ TODO: Using external programs: phylip,MrBayes,paup,puzzle,protml
+ </para>
+ <para>
+ Non-parametric bootstrapping is one method to test the consistency
+ of the data with the optimal tree. A set of subreplicates are
+ generated from the alignment using the method
+ from <classname>Bio::Align::Utilities</classname> called
+ <function>bootstrap_replicates</function>. One passes in an
+ alignment object and the count of the number of replicates to generate.
+ </para>
+ <para>
+ <programlisting>
use Bio::Align::Utilities qw(:all);
my $replicates = bootstrap_replicates($aln,$count);
- </programlisting>
- </para>
- </section>
- <section id="advanced_topics">
- <title>Advanced Topics</title>
- <para>
- It is possible to generate random tree topologies with a Bioperl
- object called <classname>Bio::Tree::RandomFactory</classname>. The
- factory only requires the specification of the total number of taxa
- in order to simulate a history. One can request different methods for
- generating the random phylogeny. At present, however, only the
- simple Yule backward is implemented and is the default.
- </para>
- <para>
- The trees can be generated with the following code. You can either
- specify the names of taxa or just a count of total number of taxa
- in the simulation.
- </para>
- <para>
- <programlisting>
+ </programlisting>
+ </para>
+ </section>
+
+ <section id="advanced_topics">
+ <title>Advanced Topics</title>
+ <para>
+ It is possible to generate random tree topologies with a Bioperl
+ object called <classname>Bio::Tree::RandomFactory</classname>. The
+ factory only requires the specification of the total number of taxa
+ in order to simulate a history. One can request different methods for
+ generating the random phylogeny. At present, however, only the
+ simple Yule backward is implemented and is the default.
+ </para>
+ <para>
+ The trees can be generated with the following code. You can either
+ specify the names of taxa or just a count of total number of taxa
+ in the simulation.
+ </para>
+ <para>
+ <programlisting>
use Bio::TreeIO;
use Bio::Tree::RandomFactory;
# initialize a TreeIO writer to output the trees as we create them
@@ -382,39 +383,39 @@
while( my $tree = $factory->next_tree) {
$out->write_tree($tree);
}
- </programlisting>
- </para>
- <para>
- There are more sophisticated operations that you may wish to pursue
- with these objects. We have tried to create a framework for this type
- of data, but by no means should this be looked at as the final
- product. If you have a particular statistic or function that
- applies to trees that you would like to see included in the
- toolkit we encourage you to send details to the Bioperl list.
- </para>
-
- </section>
- <section id="References">
- <!-- do bibliography here -->
- <title>References and More Reading</title>
- <para>
- For more reading and some references for the techniques above see
- these titles.
- </para>
- <para>
- <simplelist>
- <member>
- J. Felsenstein, "Infering Phylogenies" 2003. Sinuar and Associates.
- </member>
- <member>
- D. Swoffrod, Olsen, Waddell and D. Hillis, "Phylogenetic Inference"
- 1996. in Mol. Systematics, 2nd ed, 1996, Ch 11.
- </member>
- <member>
- Eddy SR, Durbin R, Krogh A, Mitchison G, "Biological Sequence
- Analysis" 1998. Cambridge Univ Press, Cambridge, UK.
- </member>
- </simplelist>
- </para>
+ </programlisting>
+ </para>
+ <para>
+ There are more sophisticated operations that you may wish to pursue
+ with these objects. We have tried to create a framework for this type
+ of data, but by no means should this be looked at as the final
+ product. If you have a particular statistic or function that
+ applies to trees that you would like to see included in the
+ toolkit we encourage you to send details to the Bioperl list.
+ </para>
</section>
+
+ <section id="References">
+ <!-- do bibliography here -->
+ <title>References and More Reading</title>
+ <para>
+ For more reading and some references for the techniques above see
+ these titles.
+ </para>
+ <para>
+ <simplelist>
+ <member>
+ J. Felsenstein, "Infering Phylogenies" 2003. Sinuar and Associates.
+ </member>
+ <member>
+ D. Swoffrod, Olsen, Waddell and D. Hillis, "Phylogenetic Inference"
+ 1996. in Mol. Systematics, 2nd ed, 1996, Ch 11.
+ </member>
+ <member>
+ Eddy SR, Durbin R, Krogh A, Mitchison G, "Biological Sequence
+ Analysis" 1998. Cambridge Univ Press, Cambridge, UK.
+ </member>
+ </simplelist>
+ </para>
+ </section>
</article>
More information about the Bioperl-guts-l
mailing list