skbio.tree.TreeNode.from_taxdump#

classmethod TreeNode.from_taxdump(nodes, names=None)[source]#

Construct a tree from the NCBI taxonomy database.

Parameters:
nodespd.DataFrame

Taxon hierarchy.

namespd.DataFrame or dict, optional

Taxon names.

Returns:
TreeNode

The constructed tree.

Raises:
ValueError

If there is no top-level node.

ValueError

If there are more than one top-level node.

Notes

nodes and names correspond to “nodes.dmp” and “names.dmp” of the NCBI taxonomy database. The should be read into data frames using skbio.io.read prior to this operation. Alternatively, names may be provided as a dictionary. If names is omitted, taxonomy IDs be used as taxon names.

Examples

>>> import pandas as pd
>>> from skbio.tree import TreeNode
>>> nodes = pd.DataFrame([
...             [1, 1, 'no rank'],
...             [2, 1, 'domain'],
...             [3, 1, 'domain'],
...             [4, 2, 'phylum'],
...             [5, 2, 'phylum']], columns=[
...     'tax_id', 'parent_tax_id', 'rank']).set_index('tax_id')
>>> names = {1: 'root', 2: 'Bacteria', 3: 'Archaea',
...          4: 'Firmicutes', 5: 'Bacteroidetes'}
>>> tree = TreeNode.from_taxdump(nodes, names)
>>> print(tree.ascii_art())
                    /-Firmicutes
          /Bacteria|
-root----|          \-Bacteroidetes
         |
          \-Archaea