Dotfiles querying-mlflow-metrics
Fetches aggregated trace metrics (token usage, latency, trace counts, quality evaluations) from MLflow tracking servers. Triggers on requests to show metrics, analyze token usage, view LLM costs, check usage trends, or query trace statistics.
install
source · Clone the upstream repo
git clone https://github.com/msbaek/dotfiles
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/msbaek/dotfiles "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/querying-mlflow-metrics" ~/.claude/skills/msbaek-dotfiles-querying-mlflow-metrics && rm -rf "$T"
manifest:
.claude/skills/querying-mlflow-metrics/SKILL.mdsource content
MLflow Metrics
Run
scripts/fetch_metrics.py to query metrics from an MLflow tracking server.
Examples
Token usage summary:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG
Output:
AVG: 223.91 SUM: 7613
Hourly token trend (last 24h):
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \ -t 3600 --start-time="-24h" --end-time=now
Output: Time-bucketed token sums per hour
Latency percentiles by trace:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name
Error rate by status:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status
Quality scores by evaluator (assessments):
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \ -m assessment_value -a AVG,P50 -d assessment_name
Output: Average and median scores for each evaluator (e.g., correctness, relevance)
Assessment count by name:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \ -m assessment_count -a COUNT -d assessment_name
JSON output: Add
-o json to any command.
Arguments
| Arg | Required | Description |
|---|---|---|
| Yes | MLflow server URL |
| Yes | Experiment IDs (comma-separated) |
| Yes | , , , , |
| Yes | , , , , , , , |
| No | Group by: , |
| No | Bucket size in seconds (3600=hourly, 86400=daily) |
| No | , , , ISO 8601, or epoch ms |
| No | Same formats as start-time |
| No | (default) or |
For SPANS metrics (
span_count, latency), add -v SPANS.
For ASSESSMENTS metrics, add -v ASSESSMENTS.
See references/api_reference.md for filter syntax and full API details.