A comparison and evaluation of statistical methods for mediation analysis with mixtures of environmental exposures
Background Environmental studies often evaluate how exposures influence health outcomes through intermediate biological processes. In practice, researchers are often interested in complex exposure mixtures rather than single agents, creating challenges for mediation analysis due to strong correlations among exposures, sparsity of active exposures, and possible nonlinear and interactive effects. This study compares and evaluates approaches for mediation analysis when exposures involve complex mixtures.
Methods We review four strategies: (1) single-exposure mediation analysis that analyzes each exposure separately; (2) principal component–based mediation analysis that summarizes correlated exposures into orthogonal components; (3) environmental risk score–based mediation analysis that constructs a supervised prediction score for the exposure set and treats the score as the exposure; and (4) Bayesian kernel machine regression causal mediation analysis that flexibly models nonlinear and interactive mixture effects. For each approach, we clarify the target estimand and the assumptions required for causal interpretation. We conduct a simulation study to systematically evaluate the operating characteristics of these four methods to estimate global indirect effects and to identify individual exposures contributing to the global mediation under varying sample sizes and effect sizes. We then illustrate an application of these approaches in an observational birth cohort.
Results In the simulation study, the single-exposure mediation analysis approach often produced highly biased estimates when not adjusting for co-exposures, and this bias was substantially reduced after co-exposure adjustment. For the mediation analysis methods designed to address the correlation and complexity in exposure mixtures, the performance often depended on a number of method-specific analytic choices, such as the number of principal components retained or the variable selection approach used in the Bayesian kernel machine regression method. In the data application, all methods found limited evidence of non-null global indirect effects and had broad agreement in which individual exposures were identified as potentially active, despite differences in their assumptions and causal estimands.
Conclusion Multiple strategies are available for mediation analysis with exposure mixtures, each with distinct strengths. The study provides guidance on selecting and applying methods according to study aims and data features.
Author note: Sean McGrath and Yiran Wang contributed equally to this work.
