Quick Reference 13

Responsible AI

Quick reference for fairness metrics, bias detection, explainability, privacy techniques, model cards, and audit checklists.

8 min readAI EthicsQuick ReferenceDownload PDF

Responsible AI Pillars

Responsible AI is not an optional add-on -- it is a set of engineering practices that prevent your models from causing harm. These six pillars define the minimum standard for any AI system deployed to real users.

PillarCore QuestionKey Methods
FairnessDoes the model treat all groups equitably?Fairness metrics, bias audits
TransparencyCan we explain how the model works?SHAP, LIME, model cards
PrivacyDoes it protect individual data?DP, federated learning, anonymization
SafetyCan it cause harm?Red-teaming, guardrails, testing
AccountabilityWho is responsible for outcomes?Governance, audit trail, documentation
RobustnessDoes it work reliably under stress?Adversarial testing, edge cases

Fairness Metrics

Fairness metrics quantify whether your model treats different demographic groups equitably. You cannot satisfy all fairness metrics simultaneously when base rates differ -- so choose the metric that aligns with your specific use case and regulatory requirements.

Group Fairness Metrics

MetricDefinitionTargetGood For
Demographic ParityP(Y=1|A=a) = P(Y=1|A=b)Equal selection rates across groupsHiring, lending
Equalized OddsP(Y=1|Y_true=y, A=a) = P(Y=1|Y_true=y, A=b)Equal TPR and FPR across groupsCriminal justice
Equal OpportunityP(Y=1|Y_true=1, A=a) = P(Y=1|Y_true=1, A=b)Equal TPR across groupsLoan approval
Predictive ParityP(Y_true=1|Y=1, A=a) = P(Y_true=1|Y=1, A=b)Equal precision across groupsMedical diagnosis
CalibrationP(Y=1|score=s, A=a) = P(Y=1|score=s, A=b)Equal calibration across groupsRisk scoring
Counterfactual FairnessY(a) = Y(a') for individualSame prediction if sensitive attribute changedAny

Metric Impossibility Theorem

You cannot simultaneously satisfy all fairness metrics when base rates differ between groups. Choose the metric most appropriate for your context.

Computing Fairness Metrics

from fairlearn.metrics import (
    demographic_parity_difference,
    equalized_odds_difference,
    MetricFrame
)
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Compute metrics by group
metric_frame = MetricFrame(
    metrics={
        "accuracy": accuracy_score,
        "precision": precision_score,
        "recall": recall_score,
    },
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sensitive_features
)

print(metric_frame.by_group)
print(f"Demographic parity diff: {demographic_parity_difference(y_test, y_pred, sensitive_features=sensitive_features)}")
print(f"Equalized odds diff: {equalized_odds_difference(y_test, y_pred, sensitive_features=sensitive_features)}")

Bias Detection

Bias enters your model at every stage -- from data collection through deployment. Detecting it requires systematic checking at each stage, not a one-time audit after training.

Types of Bias

Bias TypeStageDescriptionExample
HistoricalDataTraining data reflects past discriminationHiring data skewed toward majority group
RepresentationDataUnder/overrepresentation of groupsMedical data mostly from one demographic
MeasurementDataFeatures proxy for protected attributesZip code as proxy for race
AggregationModelingSingle model for heterogeneous populationsOne model for all age groups
EvaluationEvaluationTest data not representativeEval set lacks minority examples
DeploymentDeploymentModel used in unintended contextCredit model used for hiring

Bias Detection Checklist

  • Examine training data demographics vs target population
  • Check label distribution across protected groups
  • Test for proxy features (correlated with protected attributes)
  • Compute fairness metrics across all protected groups
  • Analyze error rates by subgroup
  • Test with counterfactual examples (change only sensitive attribute)
  • Review feature importance for proxy discrimination
  • Evaluate on diverse test sets

Bias Mitigation Strategies

StageTechniqueDescription
Pre-processingResamplingOver/under-sample to balance groups
Pre-processingReweightingAssign higher weights to underrepresented groups
Pre-processingFeature removalRemove or transform proxy features
In-processingConstrained optimizationAdd fairness constraints to loss function
In-processingAdversarial debiasingTrain adversary to predict sensitive attribute
Post-processingThreshold adjustmentDifferent classification thresholds per group
Post-processingCalibrationEqualize calibration across groups

Explainability

If you cannot explain why your model made a decision, you cannot debug it, audit it, or defend it in a regulatory review. Explainability methods range from fast-and-approximate to slow-and-rigorous -- choose based on your audience and stakes.

Methods Comparison

MethodTypeScopeSpeedFaithfulness
SHAPModel-agnosticLocal + GlobalSlowHigh
LIMEModel-agnosticLocalMediumMedium
Integrated GradientsGradient-basedLocalFastHigh
Attention weightsModel-specificLocalFastLow-Medium
Feature importanceModel-specificGlobalFastMedium
CounterfactualModel-agnosticLocalMediumHigh
TCAV (Concept-based)Neural networksGlobalSlowHigh
Partial DependenceModel-agnosticGlobalMediumMedium

SHAP Example

import shap

# For tree models (fast)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

# For any model (slower)
explainer = shap.KernelExplainer(model.predict, shap.sample(X_train, 100))
shap_values = explainer.shap_values(X_test[:10])
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])

LIME Example

from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(
    X_train.values,
    feature_names=X_train.columns.tolist(),
    class_names=["Denied", "Approved"],
    mode="classification"
)

# Explain single prediction
exp = explainer.explain_instance(
    X_test.iloc[0].values,
    model.predict_proba,
    num_features=10
)
exp.show_in_notebook()

When to Use Which

ScenarioRecommended Method
Regulatory audit (need consistency)SHAP (Shapley values are theoretically grounded)
Quick debugging a single predictionLIME
Understanding overall model behaviorSHAP summary plot + Partial Dependence
Deep learning modelsIntegrated Gradients, Attention
"What would change the outcome?"Counterfactual explanations
Non-technical stakeholder communicationLIME (intuitive), feature importance

Privacy Techniques

Privacy is not just about compliance -- a model that memorizes training data can leak personal information at inference time. These techniques protect individual data while preserving the model's ability to learn useful patterns.

Differential Privacy (DP)

ConceptDescription
Epsilon (e)Privacy budget. Lower = more private. Typical: 1-10
Delta (d)Probability of privacy breach. Typical: 1/n^2
SensitivityMax change one record causes in output
Noise mechanismLaplace (pure DP) or Gaussian (approximate DP)
CompositionPrivacy degrades with multiple queries
# DP-SGD training with Opacus (PyTorch)
from opacus import PrivacyEngine

privacy_engine = PrivacyEngine()
model, optimizer, dataloader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=dataloader,
    epochs=10,
    target_epsilon=3.0,
    target_delta=1e-5,
    max_grad_norm=1.0
)

Federated Learning

┌──────────┐     ┌──────────┐     ┌──────────┐
│ Client A │     │ Client B │     │ Client C │
│ (local   │     │ (local   │     │ (local   │
│  data)   │     │  data)   │     │  data)   │
└────┬─────┘     └────┬─────┘     └────┬─────┘
     │ gradients      │ gradients      │ gradients
     └───────────┬────┴───────────┬────┘
                 ▼                ▼
          ┌──────────────────────────┐
          │    Aggregation Server    │
          │ (aggregate gradients,    │
          │  update global model)    │
          └──────────────────────────┘

Privacy Technique Selection

TechniqueProtects AgainstUse Case
Differential PrivacyMembership inference, reconstructionAnalytics, ML training
Federated LearningData centralizationCross-org collaboration
K-anonymityRe-identificationData publishing
Secure enclaves (TEE)Server-side accessCloud computing
Homomorphic encryptionAny computationVery sensitive data
Data masking/tokenizationDirect exposurePII in logs/storage

Model Cards

A model card is the nutrition label for your ML model -- it documents what the model does, what it was trained on, where it works well, and where it fails. Every model deployed to production should have one.

Model Card Template

# Model Card: [Model Name]

## Model Details
- **Developer:** [Team/Organization]
- **Model date:** [Date]
- **Model version:** [Version]
- **Model type:** [Architecture]
- **License:** [License]

## Intended Use
- **Primary use:** [Intended application]
- **Out-of-scope:** [What it should NOT be used for]
- **Users:** [Intended users]

## Training Data
- **Dataset:** [Name, size, date range]
- **Preprocessing:** [Steps taken]
- **Demographics:** [Population represented]

## Evaluation
- **Metrics:** [Which metrics, why]
- **Overall performance:** [Results]
- **Disaggregated performance:** [Results by subgroup]

## Limitations
- [Known limitation 1]
- [Known limitation 2]

## Ethical Considerations
- [Potential harms]
- [Mitigations applied]

## Monitoring
- [How the model is monitored in production]
- [Feedback mechanisms]

Audit Checklist

This checklist is your gate between development and production. Skipping any item does not save time -- it creates risk that compounds after deployment when fixing problems is 10x more expensive.

Pre-Deployment

  • Model card completed and reviewed
  • Training data documented (source, demographics, known biases)
  • Fairness metrics computed across all protected groups
  • Bias mitigation applied where metrics indicate disparities
  • Explainability methods validated (SHAP/LIME outputs make sense)
  • Privacy assessment completed (PII handling, DP if applicable)
  • Red-team testing performed (adversarial inputs, edge cases)
  • Stakeholder review (legal, compliance, affected communities)
  • Performance benchmarks meet minimum thresholds for all subgroups
  • Fallback/override mechanism exists for incorrect predictions

Post-Deployment

  • Monitoring dashboards active (fairness metrics, performance by group)
  • Drift detection configured (feature, prediction, fairness drift)
  • Feedback collection mechanism live
  • Incident response plan documented
  • Regular re-evaluation scheduled (quarterly minimum)
  • Model deprecation criteria defined

Regulations and Frameworks

AI regulation is accelerating globally, and non-compliance can mean fines, lawsuits, or deployment bans. Know which frameworks apply to your use case and jurisdiction before you ship.

Regulation/FrameworkRegionKey Requirements
EU AI ActEURisk classification, transparency, human oversight
NIST AI RMFUSGovern, Map, Measure, Manage lifecycle
CCPA/CPRACaliforniaConsumer data rights, automated decision rights
GDPR Art. 22EURight to explanation for automated decisions
NYC Local Law 144NYCBias audit for hiring AI, public disclosure
Canada AIDACanadaHigh-impact system assessment, transparency
ISO 42001GlobalAI Management System standard

Common Pitfalls

Responsible AI failures rarely come from malice -- they come from shortcuts, blind spots, and treating fairness as a checkbox instead of a continuous practice.

PitfallProblemFix
Only checking overall accuracyHides group disparitiesAlways disaggregate metrics by subgroup
Removing sensitive attributesProxies still cause biasTest for proxy features, use counterfactuals
One-time bias checkBias can emerge over timeContinuous monitoring in production
Explainability as afterthoughtCannot explain modelChoose interpretable models or plan for SHAP/LIME
Ignoring downstream effectsModel used in harmful contextDocument intended use, monitor deployments
Privacy compliance onlyMisses ethical concernsGo beyond legal minimum
No human oversightAutomated harmHuman-in-the-loop for high-stakes decisions
Cherry-picked fairness metricActual fairness not achievedReport multiple metrics, justify selection