skbio.stats.ordination.MMvecResults#

class skbio.stats.ordination.MMvecResults(microbe_embeddings, metabolite_embeddings, ranks, convergence)[source]#

Results from MMvec analysis.

This class contains the learned embeddings and co-occurrence patterns from fitting an MMvec model. The key outputs enable both interpretation (which microbes co-occur with which metabolites) and prediction (expected metabolites given a microbial community).

Added in version 0.7.2.

Attributes:
microbe_embeddingspd.DataFrame

Microbe coordinates in latent space. Shape: (n_microbes, n_components + 1) where +1 is the bias term.

Each row is a vector representation of a microbe. Microbes with similar embedding vectors tend to co-occur with similar sets of metabolites. The Euclidean distance or cosine similarity between microbe embeddings can be used to identify functionally related microbes. The final column (“bias”) captures the baseline tendency of each microbe to associate with metabolites overall.

metabolite_embeddingspd.DataFrame

Metabolite coordinates in latent space. Shape: (n_metabolites, n_components + 1) where +1 is the bias term.

Each row is a vector representation of a metabolite. Metabolites with similar embedding vectors tend to co-occur with similar sets of microbes. The first row corresponds to the reference metabolite (all zeros) used for identifiability. The distance between metabolite embeddings indicates similarity in their microbial associations. The final column (“bias”) captures the baseline abundance of each metabolite.

rankspd.DataFrame

Log conditional probability matrix (co-occurrence scores). Shape: (n_microbes, n_metabolites). Row-centered.

Entry (i, j) represents the log-odds of observing metabolite j given microbe i, relative to the row mean. Higher values indicate stronger positive associations. This matrix is row-centered (each row sums to zero) for identifiability. To obtain actual conditional probabilities, use the probabilities method.

The ranks matrix is the primary output for identifying microbe-metabolite associations. Sorting each row reveals which metabolites are most strongly associated with each microbe.

convergencepd.DataFrame

Training diagnostics with columns:

  • iteration: Iteration number (1-indexed).

  • loss: Negative log-posterior (lower is better).

Use this to diagnose training issues. The loss should generally decrease and stabilize. If the loss is still decreasing at the final iteration, consider increasing max_iter. If the loss oscillates (Adam optimizer), try reducing learning_rate.

See also

mmvec

Fit an MMvec model.

probabilities

Convert ranks to conditional probabilities.

predict

Predict metabolite distributions for new samples.

score

Evaluate predictive performance with Q².

Notes

Detecting Overfitting with Q²

Overfitting occurs when the model memorizes training data rather than learning generalizable patterns. To detect overfitting:

  1. Split your data into training and test sets before fitting.

  2. Fit the model on training data only.

  3. Use score to compute Q² on held-out test data.

Interpretation of Q² values:

  • Q² close to 1: Excellent predictive performance.

  • Q² close to 0: Model predicts no better than the mean.

  • Q² negative: Model performs worse than predicting the mean, indicating overfitting or model misspecification.

If Q² is much lower than expected, try:

  • Reducing n_components (fewer latent dimensions).

  • Increasing regularization via smaller u_prior_scale and v_prior_scale values.

  • Collecting more training samples.

Embedding Interpretation

The embeddings place microbes and metabolites in the same latent space. The inner product between a microbe embedding and metabolite embedding (plus bias terms) gives the log-odds of their co-occurrence:

\[\log \frac{P(m_j | \mu_i)}{P(m_{\text{ref}} | \mu_i)} = U_i \cdot V_j + b_{U_i} + b_{V_j}\]

This means:

  • Microbes pointing in similar directions associate with similar metabolites.

  • Metabolites pointing in similar directions are produced/consumed by similar microbes.

  • The angle between a microbe and metabolite vector indicates their association strength.

Methods

predict

Predict metabolite distributions given microbe abundances.

probabilities

Convert ranks to probability matrix via softmax.

score

Compute Q² (coefficient of prediction) on held-out data.

Special methods

__str__

Return string representation of MMvecResults.

Special methods (inherited)

__eq__

Return self==value.

__ge__

Return self>=value.

__getstate__

Helper for pickle.

__gt__

Return self>value.

__hash__

Return hash(self).

__le__

Return self<=value.

__lt__

Return self<value.

__ne__

Return self!=value.

Details

__str__()[source]#

Return string representation of MMvecResults.