Metaskill generate-report
Generate a comprehensive summary report of the latest experiment including metrics, plots, and comparison with baseline. Use this after training and evaluation to create a shareable experiment summary.
git clone https://github.com/xvirobotics/metaskill
T=$(mktemp -d) && git clone --depth=1 https://github.com/xvirobotics/metaskill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/data-science/.claude/skills/generate-report" ~/.claude/skills/xvirobotics-metaskill-generate-report && rm -rf "$T"
examples/data-science/.claude/skills/generate-report/SKILL.mdYou are generating a comprehensive experiment report for this data science project. Your goal is to gather all available metrics, plots, and configuration details from the latest experiment and produce a clear, well-structured report that can be shared with the team.
Dynamic Context
Current branch: !
git branch --show-current
Git commit: !git rev-parse --short HEAD 2>/dev/null || echo "unknown"
Recent experiment logs: !ls -lt reports/*.json experiments/*.json 2>/dev/null | head -5 || echo "No experiment logs found"
Available plots: !ls reports/figures/*.png reports/figures/*.svg 2>/dev/null | head -10 || echo "No plots found"
Checkpoints: !ls -lt checkpoints/*.pt checkpoints/*.pth 2>/dev/null | head -3 || echo "No checkpoints"
Config used: !ls configs/*.yaml configs/*.toml 2>/dev/null | head -3 || echo "No configs"
Experiment Name
If the user provided an experiment name:
$ARGUMENTS
Otherwise, derive one from the branch name, latest config file, or use the current date.
Report Generation Process
Step 1: Gather Experiment Data
Collect all available information about the latest experiment:
- Metrics: Read the latest metrics JSON from
orreports/experiments/ - Training logs: Look for training output logs, MLflow run data, or W&B run summaries
- Configuration: Read the experiment config file (YAML/TOML)
- Checkpoint metadata: Load the best checkpoint and extract epoch, metric, config
- Dataset statistics: Look for data profiling outputs or read from data validation logs
# Find and read latest metrics METRICS_FILE=$(ls -t reports/*.json experiments/*.json 2>/dev/null | head -1) if [ -n "$METRICS_FILE" ]; then echo "=== Latest Metrics ===" cat "$METRICS_FILE" fi # Find config used CONFIG_FILE=$(ls -t configs/*.yaml configs/*.toml 2>/dev/null | head -1) if [ -n "$CONFIG_FILE" ]; then echo "=== Configuration ===" cat "$CONFIG_FILE" fi
Step 2: Gather Baseline Data
Look for baseline metrics to compare against:
- Check for a
orreports/baseline_metrics.jsonexperiments/baseline.json - Check git history for previous metrics files:
git log --oneline --all -- reports/*.json - If MLflow is configured, query for the baseline run
- If no baseline exists, note this in the report
Step 3: Generate Visualizations
If plots do not already exist, generate them:
python3 -c " import json from pathlib import Path # Check if visualization script exists viz_script = Path('src/evaluation/visualize.py') if viz_script.exists(): print('Visualization script found') else: print('No visualization script found -- will generate basic plots') "
Key visualizations to include:
- Training curves: loss and metric over epochs (train vs. validation)
- Confusion matrix: if classification task
- Metric comparison bar chart: current vs. baseline
- Feature importance: if available from the model or analysis
Step 4: Write the Report
Generate the report as a Markdown file at
reports/experiment_report.md:
# Experiment Report: [Experiment Name] **Date:** [current date] **Branch:** [git branch] **Commit:** [git commit hash] **Author:** [generated by /generate-report skill] --- ## Executive Summary [2-3 sentences: what was the experiment, what was the key result, and is it better than baseline?] ## Experiment Configuration | Parameter | Value | |-----------|-------| | Model architecture | [from config] | | Learning rate | [from config] | | Batch size | [from config] | | Epochs | [from config] | | Optimizer | [from config] | | Scheduler | [from config] | | Random seed | [from config] | | Dataset version | [from config or DVC] | ## Dataset Summary | Split | Samples | Features | Classes | |-------|---------|----------|---------| | Train | [count] | [count] | [count or N/A] | | Validation | [count] | [count] | [count or N/A] | | Test | [count] | [count] | [count or N/A] | ## Results ### Final Metrics | Metric | Value | |--------|-------| | [metric 1] | [value] | | [metric 2] | [value] | | ... | ... | ### Comparison with Baseline | Metric | Baseline | Current | Delta | Improvement? | |--------|----------|---------|-------|-------------| | [metric 1] | [value] | [value] | [+/- value] | [Yes/No] | | ... | ... | ... | ... | ... | ### Training Curves   ### Confusion Matrix  ## Analysis ### Key Findings - [Finding 1: most important result] - [Finding 2: notable pattern or observation] - [Finding 3: any concerning behavior] ### Error Analysis - [What types of errors does the model make?] - [Are errors concentrated in specific classes or data subsets?] ### Comparison with Previous Experiments - [How does this compare to previous runs?] - [What changed and what impact did it have?] ## Recommendations ### Next Steps 1. [Actionable recommendation 1] 2. [Actionable recommendation 2] 3. [Actionable recommendation 3] ### Potential Improvements - [Idea for model improvement] - [Idea for data improvement] - [Idea for training procedure improvement] ## Artifacts | Artifact | Path | |----------|------| | Best checkpoint | checkpoints/best_model.pt | | Metrics JSON | reports/metrics.json | | Config file | configs/experiment.yaml | | Training logs | experiments/[run-id]/ | | Figures | reports/figures/ | --- *Report generated automatically by the /generate-report skill.*
Step 5: Verify Report Quality
After writing the report:
- Read it back and verify all placeholders are filled with actual data
- Verify all referenced figure paths exist
- Verify metrics values are reasonable (not NaN, not obviously wrong)
- Ensure the executive summary accurately reflects the detailed results
- Check that recommendations are specific and actionable, not generic
Report the path to the generated report file when complete.