skbio.stats.ordination.MMvecResults#
- class skbio.stats.ordination.MMvecResults(microbe_embeddings, metabolite_embeddings, ranks, convergence)[source]#
Results from MMvec analysis.
This class contains the learned embeddings and co-occurrence patterns from fitting an MMvec model. The key outputs enable both interpretation (which microbes co-occur with which metabolites) and prediction (expected metabolites given a microbial community).
Added in version 0.7.2.
- Attributes:
- microbe_embeddingspd.DataFrame
Microbe coordinates in latent space. Shape: (n_microbes, n_components + 1) where +1 is the bias term.
Each row is a vector representation of a microbe. Microbes with similar embedding vectors tend to co-occur with similar sets of metabolites. The Euclidean distance or cosine similarity between microbe embeddings can be used to identify functionally related microbes. The final column (“bias”) captures the baseline tendency of each microbe to associate with metabolites overall.
- metabolite_embeddingspd.DataFrame
Metabolite coordinates in latent space. Shape: (n_metabolites, n_components + 1) where +1 is the bias term.
Each row is a vector representation of a metabolite. Metabolites with similar embedding vectors tend to co-occur with similar sets of microbes. The first row corresponds to the reference metabolite (all zeros) used for identifiability. The distance between metabolite embeddings indicates similarity in their microbial associations. The final column (“bias”) captures the baseline abundance of each metabolite.
- rankspd.DataFrame
Log conditional probability matrix (co-occurrence scores). Shape: (n_microbes, n_metabolites). Row-centered.
Entry (i, j) represents the log-odds of observing metabolite j given microbe i, relative to the row mean. Higher values indicate stronger positive associations. This matrix is row-centered (each row sums to zero) for identifiability. To obtain actual conditional probabilities, use the
probabilitiesmethod.The ranks matrix is the primary output for identifying microbe-metabolite associations. Sorting each row reveals which metabolites are most strongly associated with each microbe.
- convergencepd.DataFrame
Training diagnostics with columns:
iteration: Iteration number (1-indexed).loss: Negative log-posterior (lower is better).
Use this to diagnose training issues. The loss should generally decrease and stabilize. If the loss is still decreasing at the final iteration, consider increasing
max_iter. If the loss oscillates (Adam optimizer), try reducinglearning_rate.
See also
mmvecFit an MMvec model.
probabilitiesConvert ranks to conditional probabilities.
predictPredict metabolite distributions for new samples.
scoreEvaluate predictive performance with Q².
Notes
Detecting Overfitting with Q²
Overfitting occurs when the model memorizes training data rather than learning generalizable patterns. To detect overfitting:
Split your data into training and test sets before fitting.
Fit the model on training data only.
Use
scoreto compute Q² on held-out test data.
Interpretation of Q² values:
Q² close to 1: Excellent predictive performance.
Q² close to 0: Model predicts no better than the mean.
Q² negative: Model performs worse than predicting the mean, indicating overfitting or model misspecification.
If Q² is much lower than expected, try:
Reducing
n_components(fewer latent dimensions).Increasing regularization via smaller
u_prior_scaleandv_prior_scalevalues.Collecting more training samples.
Embedding Interpretation
The embeddings place microbes and metabolites in the same latent space. The inner product between a microbe embedding and metabolite embedding (plus bias terms) gives the log-odds of their co-occurrence:
\[\log \frac{P(m_j | \mu_i)}{P(m_{\text{ref}} | \mu_i)} = U_i \cdot V_j + b_{U_i} + b_{V_j}\]This means:
Microbes pointing in similar directions associate with similar metabolites.
Metabolites pointing in similar directions are produced/consumed by similar microbes.
The angle between a microbe and metabolite vector indicates their association strength.
Methods
Predict metabolite distributions given microbe abundances.
Convert ranks to probability matrix via softmax.
Compute Q² (coefficient of prediction) on held-out data.
Special methods
Return string representation of MMvecResults.
Special methods (inherited)
__eq__Return self==value.
__ge__Return self>=value.
__getstate__Helper for pickle.
__gt__Return self>value.
__hash__Return hash(self).
__le__Return self<=value.
__lt__Return self<value.
__ne__Return self!=value.
Details