install
source · Clone the upstream repo
git clone https://github.com/ai-analyst-lab/ai-analyst
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ai-analyst-lab/ai-analyst "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/compare-datasets" ~/.claude/skills/ai-analyst-lab-ai-analyst-compare-datasets && rm -rf "$T"
manifest:
.claude/skills/compare-datasets/skill.mdsource content
Skill: Compare Datasets
Purpose
Compare metrics, findings, and patterns across two or more connected datasets. Helps identify cross-dataset patterns (e.g., "conversion funnel behavior is similar across both product lines") and dataset-specific anomalies.
When to Use
- User says
or "compare across datasets"/compare-datasets - After analyzing multiple datasets, to find commonalities
- When the user asks "is this pattern unique to this dataset?"
Invocation
/compare-datasets — compare active dataset with all others
/compare-datasets {id1} {id2} — compare two specific datasets
/compare-datasets metric={name} — compare a specific metric across datasets
Instructions
Step 1: Identify Datasets to Compare
- Read
to enumerate all connected datasets..knowledge/datasets/ - If specific datasets are named, validate they exist.
- If no datasets specified, use active + all others.
- Require at least 2 datasets. If only 1 exists: "Only one dataset connected. Use
to add another."/connect-data
Step 2: Load Metric Dictionaries
For each dataset:
- Read
.knowledge/datasets/{id}/metrics/index.yaml - Build a union of all metric IDs across datasets
- Identify shared metrics (same ID or same name) vs. dataset-specific metrics
Step 3: Compare Shared Metrics
For each metric that exists in 2+ datasets:
- Load the metric YAML from each dataset
- Compare: definition match? (same formula, same unit)
- Compare: typical range overlap? (do the datasets have similar baselines?)
- Compare: guardrails alignment? (are thresholds consistent?)
- Flag discrepancies: "conversion_rate is defined differently in {dataset_a} vs {dataset_b}"
Step 4: Compare Analysis History
For each dataset:
- Read
.knowledge/analyses/index.yaml - Extract key findings from recent analyses
- Look for cross-dataset patterns:
- Same finding appearing in multiple datasets
- Opposite findings (metric up in one, down in another)
- Same root cause identified independently
Step 5: Generate Cross-Dataset Observations
Write findings to
.knowledge/global/cross_dataset_observations.yaml:
- Shared patterns: behaviors that appear across datasets
- Divergences: where datasets behave differently
- Metric alignment: which metrics are consistently defined
- Suggested investigations: questions raised by the comparison
Step 6: Present Results
Display a comparison table:
Cross-Dataset Comparison: {dataset_a} vs {dataset_b} Shared Metrics: {N} ({M} with matching definitions) Metric Discrepancies: {list} Shared Patterns: - {pattern description} (seen in both datasets) Divergences: - {metric} is {direction} in {dataset_a} but {direction} in {dataset_b} Suggested Next: - "Investigate why {pattern} differs between datasets" - "Align {metric} definitions across datasets"
Edge Cases
- Only 1 dataset: Cannot compare — suggest connecting another
- No shared metrics: Report this — datasets may serve different purposes
- No analysis history: Compare schemas and metric definitions only
- Many datasets (>5): Compare pairwise with the active dataset only