scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.tree.TreeNode.from_taxdump#

classmethod TreeNode.from_taxdump(nodes, names=None)[source]#

Construct a tree from the NCBI taxonomy database.

Parameters:
nodespd.DataFrame

Taxon hierarchy

namespd.DataFrame or dict, optional

Taxon names

Returns:
TreeNode

The constructed tree

Raises:
ValueError

If there is no top-level node

ValueError

If there are more than one top-level node

Notes

nodes and names correspond to “nodes.dmp” and “names.dmp” of the NCBI taxonomy database. The should be read into data frames using skbio.io.read prior to this operation. Alternatively, names may be provided as a dictionary. If names is omitted, taxonomy IDs be used as taxon names.

Examples

>>> import pandas as pd
>>> from skbio.tree import TreeNode
>>> nodes = pd.DataFrame([
...             [1, 1, 'no rank'],
...             [2, 1, 'domain'],
...             [3, 1, 'domain'],
...             [4, 2, 'phylum'],
...             [5, 2, 'phylum']], columns=[
...     'tax_id', 'parent_tax_id', 'rank']).set_index('tax_id')
>>> names = {1: 'root', 2: 'Bacteria', 3: 'Archaea',
...          4: 'Firmicutes', 5: 'Bacteroidetes'}
>>> tree = TreeNode.from_taxdump(nodes, names)
>>> print(tree.ascii_art())
                    /-Firmicutes
          /Bacteria|
-root----|          \-Bacteroidetes
         |
          \-Archaea