Review article
Bias recognition and mitigation strategies in artificial intelligence healthcare applications
Artificial intelligence (AI) is delivering value across all aspects of clinical practice. However, bias may exacerbate healthcare disparities. This review examines the origins of bias in healthcare Al, strategies for mitigation, and responsibilities of relevant stakeholders towards achieving fair and equitable use. We highlight the importance of systematically identifying bias and engaging relevant mitigation activities throughout the Al model lifecycle, from model conception through to deployment and longitudinal surveillance.
Executive Impact: Key AI Healthcare Metrics
Understand the critical performance and bias trends shaping AI adoption in healthcare, and their implications for your organization.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
In the context of healthcare AI, bias can be defined as any systematic and/or unfair difference in how predictions are generated for different patient populations that could lead to disparate care delivery. Through this, disparities related to benefit or harm are introduced or exacerbated for specific individuals or groups, eroding the capacity for healthcare to be delivered in a fair and equitable manner. This complexity is compounded by the inadequacy of methods for routinely detecting or mitigating biases across various stages of an algorithm's life cycle, emphasizing a need for comprehensive and holistic bias detection frameworks.
Fairness, equality, and equity are core principles of healthcare delivery that are directly influenced by bias. Fairness in healthcare encompasses both distributive justice and socio-relational dimensions, requiring a holistic consideration of each individual's unique social, cultural, and environmental factors, these going beyond the concept of equality – which aims to ensure equal access and outcomes. Equity recognizes that certain groups may require tailored resources or support to attain comparable health benefits. Navigating these nuances and potential trade-offs between core principles is essential, as blanket approaches to fairness may inadvertently reinforce existing disparities. Differentiating equality from equity is essential to understanding the influences that AI bias can impose on healthcare disparities. These often-competing principles must be iteratively considered to achieve the best possible balance.
Bias may be introduced into all stages of an algorithm's life cycle, including their conceptual formation, data collection and preparation, algorithm development and validation, clinical implementation, or surveillance. This complexity is compounded by the inadequacy of methods for routinely detecting or mitigating biases across various stages of an algorithm's life cycle, emphasizing a need for comprehensive and holistic bias detection frameworks. Establishing standardized and repeatable approaches for mitigating bias is an expanding societal responsibility for AI-healthcare developers and providers. An AI model's life cycle includes a conception, data collection and pre-processing, in-processing (algorithm development and validation), post-processing (clinical deployment), and post-deployment surveillance phase. Systematically considering bias across these sequential phases requires a multifaceted approach tailored to identify, quantify, and mitigate its impact on the core principles of fairness, equity, and equality, and maintain the ethical integrity of healthcare AI.
Enterprise Process Flow
| Bias Category | Definition | Key Examples |
|---|---|---|
| Human Biases | Biases originating from human perceptions, assumptions, or preferences, often subconscious. |
|
| Data Biases | Biases introduced during data collection, affecting data representation and leading to skewed outcomes. |
|
| Algorithmic Biases | Biases inherent to data preprocessing or algorithm design, training, and validation. |
|
| Deployment Biases | Biases introduced during or after the AI model's implementation in clinical environments. |
|
Mitigating Feature Selection Bias in AI Risk Prediction (Case Study 1)
A widely used AI risk prediction algorithm in the U.S. healthcare system, analyzed by Obermeyer et al. in 2019, included data from 43,539 White patients and 6,079 Black patients (2013-2015). The algorithm, designed to identify high-risk patients based on predicted healthcare costs, exhibited racial bias, underestimating the needs of Black patients. The study found that Black patients had 26.3% more chronic illnesses than White patients at the same risk score level (4.8 vs. 3.8 conditions). This bias stemmed from using healthcare costs as a proxy for illness severity; systemic barriers like reduced healthcare access, financial constraints, and lower trust levels led to lower costs for Black patients, causing the algorithm to misjudge their risk. To address this, researchers recalibrated the algorithm to use direct health indicators, such as chronic condition counts, instead of costs. This change nearly tripled the enrollment of high-risk Black patients in care management programs, from 17.7% to 46.5%, promoting more equitable healthcare. However, ongoing surveillance is necessary, as reliance on historical data and evolving healthcare dynamics could allow biases to re-emerge.
Addressing Representation Bias (Racial) in Deep Learning-Based Cardiac MRI Segmentation (Case Study 2)
A deep learning model (nnU-Net) for cardiac MRI segmentation, trained on UK Biobank data of 5,903 subjects (80% White, 20% Black, Asian, Chinese, Mixed, and Other), initially showed racial bias, with the Dice Similarity Coefficient (DSC) at 93.5% for White subjects but as low as 84.5% for Black and Mixed-race subjects. Researchers used three distinct strategies to address this: (1) stratified batch sampling (balancing racial groups in each batch during pre-processing phase) improved DSC for Black subjects from 85.88% to 93.07% and for Mixed-race from 84.52% to 93.84%, though overall accuracy decreased slightly; (2) fair meta-learning (using a secondary classifier to predict race during in-processing phase) raised DSC to 89.60% for Black and 88.03% for Mixed-race but increased complexity; and (3) protected group models (training separate models for each group during in-processing phase) achieved the best results, with DSC reaching 92.15% for Black and 93.17% for Mixed-race subjects, reducing bias to a standard deviation of 0.89. However, this approach required racial information during inference, limiting its practicality.
Quantify Your AI Advantage
Use our calculator to estimate the potential time and cost savings AI can bring to your enterprise operations.
Your AI Implementation Roadmap
A structured approach to integrating AI, ensuring ethical guidelines and optimal performance from conception to continuous improvement.
Conception
Defining clear research questions, aligning with DEI principles, and identifying potential bias sources from the outset. This phase addresses implicit, systemic, confirmation, and sensitive attribute biases through early team awareness and critical thinking.
Data Collection
Generating diverse and representative datasets. Focus on capturing breadth of demographics, leveraging various sources, and assessing data quality to mitigate representation, selection, sampling, participation, measurement, and historical data biases.
Pre-processing
Cleaning and preparing raw data for model development. Meticulous attention to managing missing values, variable selection, and data augmentation to prevent aggregation, missing data, feature selection, and representation biases.
In-processing (Algorithm Development & Validation)
Training and validating the AI algorithm with an emphasis on fairness and equity. Strategies include stratified sub-group analyses, counterfactual examples, and class imbalance techniques to address algorithmic, validation, and representation biases.
Post-processing (Clinical Deployment)
Implementing the model in live clinical environments. Focus on transparent disclosure, human-in-the-loop strategies, explainability tools, and structured pre-deployment testing to address evaluation and predictive biases.
Post-deployment Surveillance
Ongoing monitoring, maintenance, and re-calibration of AI models. Mechanisms to track user engagement, decision impact, and detect biased sub-group behavior or concept drift to mitigate automation, feedback loop, and dismissal biases.
Ready to Transform Your Enterprise with Ethical AI?
Book a personalized consultation with our AI strategists to discuss how our solutions can integrate seamlessly into your operations, ensuring fairness, efficiency, and unparalleled growth.