Awesome-Agent-Skills-for-Empirical-Research responsible-ai-guide
Resources for trustworthy, fair, and ethical AI research
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/ai-ml/responsible-ai-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-responsible-ai-gu && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/domains/ai-ml/responsible-ai-guide/SKILL.mdsource content
Responsible AI Guide
Overview
A comprehensive collection of resources for building trustworthy, fair, and ethical AI systems. Covers fairness metrics, bias detection and mitigation, explainability methods, privacy-preserving techniques, robustness testing, and governance frameworks. Essential reading for researchers working on AI safety, alignment, and deploying models in high-stakes domains.
Topic Taxonomy
Responsible AI ├── Fairness │ ├── Bias detection (data, model, outcome) │ ├── Fairness metrics (demographic parity, equalized odds) │ ├── Bias mitigation (pre/in/post-processing) │ └── Intersectional fairness ├── Explainability │ ├── Feature attribution (SHAP, LIME, IG) │ ├── Concept-based (TCAV, concept bottleneck) │ ├── Counterfactual explanations │ └── Mechanistic interpretability ├── Privacy │ ├── Differential privacy │ ├── Federated learning │ ├── Membership inference attacks │ └── Machine unlearning ├── Robustness │ ├── Adversarial attacks/defenses │ ├── Distribution shift │ ├── Uncertainty quantification │ └── Out-of-distribution detection ├── Safety & Alignment │ ├── RLHF and preference learning │ ├── Constitutional AI │ ├── Red teaming │ └── Guardrails and filters └── Governance ├── Model cards ├── Datasheets for datasets ├── AI impact assessments └── Regulatory compliance (EU AI Act)
Key Tools
| Tool | Category | Purpose |
|---|---|---|
| Fairlearn | Fairness | Bias assessment + mitigation |
| AI Fairness 360 | Fairness | IBM fairness toolkit |
| SHAP | Explainability | Shapley value explanations |
| Captum | Explainability | PyTorch interpretability |
| Opacus | Privacy | Differential privacy for PyTorch |
| ART | Robustness | Adversarial robustness toolbox |
| Alibi | Explainability | ML model explanations |
Fairness Assessment
from fairlearn.metrics import MetricFrame from sklearn.metrics import accuracy_score, recall_score # Assess fairness across demographic groups metrics = MetricFrame( metrics={ "accuracy": accuracy_score, "recall": recall_score, }, y_true=y_test, y_pred=y_pred, sensitive_features=demographics, ) print("Overall:") print(metrics.overall) print("\nBy group:") print(metrics.by_group) print("\nDifference (max - min):") print(metrics.difference())
Reading Roadmap
### Foundations 1. "Fairness and Machine Learning" (Barocas, Hardt, Narayanan) 2. "Datasheets for Datasets" (Gebru et al., 2021) 3. "Model Cards for Model Reporting" (Mitchell et al., 2019) ### Fairness 4. "On Fairness and Calibration" (Pleiss et al., 2017) 5. "Fairness Through Awareness" (Dwork et al., 2012) ### Explainability 6. "A Unified Approach to Interpreting Model Predictions" (SHAP) 7. "Why Should I Trust You?" (LIME, Ribeiro et al., 2016) ### Safety 8. "Constitutional AI" (Bai et al., 2022) 9. "Red Teaming Language Models" (Perez et al., 2022) 10. "Scaling Monosemanticity" (Anthropic, 2024)
Use Cases
- Bias auditing: Check models for demographic biases
- Compliance: EU AI Act and regulatory requirements
- Model documentation: Model cards and impact assessments
- Research ethics: Ethical considerations for AI research
- Course material: Teach responsible AI principles