skbio.tree.TreeNode.cache_attr#

TreeNode.cache_attr(func, cache_attrname, cache_type=<class 'list'>, register=True)[source]#

Cache attributes on nodes of the tree through a postorder traversal.

Parameters:
funccallable

Function to calculate the attribute of the current node. The result will be combined with the attributes of the previous nodes, if applicable.

cache_attrnamestr

Name of the attribute to be attached to each node.

cache_type{list, tuple, set, frozenset}, callable, or None

The type of the cache. Can be any of the four iterable types: list (default), tuple, set, or frozenset. In these cases, combination of attributes of the node’s children and itself will be automated.

Or a custom function that takes two arguments: list of attributes of its children, and attribute calculated from itself by func, and returns the combined attribute of the node.

Or None, in which case combination of attributes of children and self will not take place, unless explicitly customized within func.

Changed in version 0.6.3: Tuple, custom function and None were added to the options.

registerbool, optional

Whether to register the attribute name as a cache of the tree, such that the attributes will be deleted from all nodes when the tree is manipulated or the clear_caches method is explicitly invoked. Default is True.

Added in version 0.6.3.

Raises:
TypeError

If cache_type is invalid.

Notes

This method provides an efficient interface to assign a custom attribute to every node of a tree through one postorder travesal. It is particularly useful if one needs to frequently look up attributes that would normally require one traversal of the tree per lookup. The assigned attributes may be automatically deleted when the tree is manipulated.

Examples

This method facilitates evaluation for various useful node properties. Some representative examples are provided below.

>>> from skbio import TreeNode
>>> tree = TreeNode.read(["((a:1.2,b:1.6)c:0.3,(d:0.8,e:1.0)f:0.6)g;"])
>>> print(tree.ascii_art())
                    /-a
          /c-------|
         |          \-b
-g-------|
         |          /-d
          \f-------|
                    \-e

Cache a list of all descending tip names on each node. This faciliates the assignment of taxon set under each clade in the tree. It resembles but is more efficient than calling subset multiple times.

>>> f = lambda n: [n.name] if n.is_tip() else []
>>> tree.cache_attr(f, 'tip_names')
>>> for node in tree.traverse(include_self=True):
...     print(f"Node: {node.name}, tips: {node.tip_names}")
Node: g, tips: ['a', 'b', 'd', 'e']
Node: c, tips: ['a', 'b']
Node: a, tips: ['a']
Node: b, tips: ['b']
Node: f, tips: ['d', 'e']
Node: d, tips: ['d']
Node: e, tips: ['e']

Cache the number of nodes per clade. The function sum is used in place of cache type such that the count will be accumulated. This resembles but is more efficient than calling count multiple times.

>>> f = lambda n: 1
>>> tree.cache_attr(f, 'node_count', sum)
>>> tree.node_count
7

Cache the sum of branch lengths per clade. This resembles but is more efficient than calling total_length multiple times.

>>> f = lambda n: n.length or 0.0
>>> tree.cache_attr(f, 'clade_size', sum)
>>> tree.clade_size
5.5

Cache the accumulative distances from all tips to the common ancestor of each clade. This is more efficient than calling depth multiple times. One can further apply calculations like mean and standard deviation to the results.

>>> import numpy as np
>>> dist_f = lambda n: np.array(n.length or 0.0, ndmin=1)
>>> comb_f = lambda prev, curr: np.concatenate(prev) + curr if prev else curr
>>> tree.cache_attr(dist_f, 'accu_dists', comb_f)
>>> tree.accu_dists
array([ 1.5,  1.9,  1.4,  1.6])