Claude-skill-registry bio-stats-ml-reporting
Aggregate results, train ML models, and produce reports with validated references.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bio-stats-ml-reporting" ~/.claude/skills/majiayu000-claude-skill-registry-bio-stats-ml-reporting && rm -rf "$T"
manifest:
skills/data/bio-stats-ml-reporting/SKILL.mdsource content
Bio Stats ML Reporting
When to use
- Aggregate results, train ML models, and produce reports with validated references.
Prerequisites
- Tools installed via pixi (see pixi.toml).
- Results tables and metadata are available.
Inputs
- results/.parquet or results/.tsv
- metadata.tsv
Outputs
- results/bio-stats-ml-reporting/models/
- results/bio-stats-ml-reporting/metrics.tsv
- results/bio-stats-ml-reporting/report.md
- results/bio-stats-ml-reporting/logs/
Steps
- Join outputs in DuckDB and build feature tables.
- Train baseline models and evaluate with cross-validation.
- Generate reports and validate references.
QC gates
- Model performance sanity checks pass.
- Reference validation passes.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
Validation
- Verify input tables are readable and schema-consistent.
Tools
- duckdb v1.4.3
- scikit-learn v1.8.0
- xgboost v3.1.3
- crossrefapi v1.7.0
Paper summaries (2023-2025)
- summaries/ (include example use cases and tool settings used)
Tool documentation
- DuckDB - In-process analytical database for data aggregation
- scikit-learn - Machine learning library
- XGBoost - Gradient boosting framework
- Crossref API - Reference validation and metadata retrieval
References
- See ../bio-skills-references.md