Towards Enhancing Data Equity in Public Health Data Science

Aug 27, 2025·
Yiran Wang
Yiran Wang
,
Alicia E Boyd
,
Lillian Rountree
,
Yi Ren
,
Kate Nyhan
,
Ruchit Nagar
,
Jackson Higginbottom
,
Megan L Ranney
,
Harsh Parikh
,
Bhramar Mukherjee
· 0 min read
Abstract
In public health, data-driven decisions profoundly influence policies, interventions, and prevention strategies. However, acute disparities in data representation across populations persist, often leading to skewed insights and suboptimal decisions. Recognizing, quantifying, and addressing these challenges require a structured roadmap that integrates insights across domains — including, but not limited to, public health data science and computer science — and critically examines these insights through reflexivity and critical theory. This need has brought increasing attention to the concept of data equity, which offers a guiding framework for addressing systemic bias in data use. Data equity aims to ensure the fair and inclusive representation, collection, and use of data to prevent the introduction or exacerbation of systemic biases that could lead to invalid downstream inference and decisions. We highlight the urgency of this issue by presenting three public health examples where the acute lack of representative datasets and skewed knowledge adversely affect decision-making across diverse sub-groups. The challenges illustrated in these examples mirror broader concerns raised in both public health and computer science literature. While existing public health literature emphasizes the paucity of high-quality data from specific sub-populations, computer science and statistical literature offer general criteria and metrics for assessing biases in data and modeling systems. Building upon foundational concepts from these fields, we propose a working definition of public health data equity and introduce a structured framework for self-auditing public health data science practices. This framework integrates core principles from computational science, such as fairness, accountability, transparency, ethics, privacy, and confidentiality, with key public health considerations, including selection bias, representativeness, generalizability, causality, and information bias. Our framework aims to guide public health researchers in evaluating and improving equity throughout the entire data life cycle: from design and collection, to measurement, analysis, interpretation, and translation. By fundamentally embedding data equity within public health research and practice, this work provides a multidisciplinary pathway toward ensuring that data-driven policies, artificial intelligence innovations, and emerging technologies foster improved health outcomes and well-being for all populations. We conclude by emphasizing the critical understanding that, although data equity is an essential first step, it does not inherently guarantee information, learning, or decision equity.
Type