Skills skills-eval

Evaluate and improve Claude skill quality through auditing

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/athola/nm-abstract-skills-eval" ~/.claude/skills/openclaw-skills-skills-eval && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/athola/nm-abstract-skills-eval" ~/.openclaw/skills/openclaw-skills-skills-eval && rm -rf "$T"
manifest: skills/athola/nm-abstract-skills-eval/SKILL.md
source content

Night Market Skill — ported from claude-night-market/abstract. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Skills Evaluation and Improvement

Table of Contents

  1. Overview
  2. Quick Start
  3. Evaluation Workflow
  4. Evaluation and Optimization
  5. Resources

Overview

This framework audits Claude skills against quality standards to improve performance and reduce token consumption. Automated tools analyze skill structure, measure context usage, and identify specific technical improvements. Run verification commands after each audit to confirm fixes work correctly.

The

skills-auditor
provides structural analysis, while the
improvement-suggester
ranks fixes by impact. Compliance is verified through the
compliance-checker
. Runtime efficiency is monitored by
tool-performance-analyzer
and
token-usage-tracker
.

Quick Start

Basic Audit

Run a full audit of all skills or target a specific file to identify structural issues.

# Audit all skills
make audit-all

# Audit specific skill
make audit-skill TARGET=path/to/skill/SKILL.md

Analysis and Optimization

Use

skill_analyzer.py
for complexity checks and
token_estimator.py
to verify the context budget.

make analyze-skill TARGET=path/to/skill/SKILL.md
make estimate-tokens TARGET=path/to/skill/SKILL.md

Improvements

Generate a prioritized plan and verify standards compliance using

improvement_suggester.py
and
compliance_checker.py
.

make improve-skill TARGET=path/to/skill/SKILL.md
make check-compliance TARGET=path/to/skill/SKILL.md

Evaluation Workflow

Start with

make audit-all
to inventory skills and identify high-priority targets. For each skill requiring attention, run analysis with
analyze-skill
to map complexity. Generate an improvement plan, apply fixes, and run
check-compliance
to verify the skill meets project standards. Finalize by checking the token budget for efficiency.

Evaluation and Optimization

Quality assessments use the

skills-auditor
and
improvement-suggester
to generate detailed reports. Performance analysis focuses on token efficiency through the
token-usage-tracker
and tool performance via
tool-performance-analyzer
. For standards compliance, the
compliance-checker
automates common fixes for structural issues.

Scoring and Prioritization

We evaluate skills across five dimensions: structure compliance, content quality, token efficiency, activation reliability, and tool integration. Scores above 90 represent production-ready skills, while scores below 50 indicate critical issues requiring immediate attention.

Improvements are prioritized by impact. Critical issues include security vulnerabilities or broken functionality. High-priority items cover structural flaws that hinder discoverability. Medium and low priorities focus on best practices and minor optimizations.

Structural Patterns

Deprecated:

skills/shared/modules/
directories. Shared modules must be relocated into the consuming skill's own
modules/
directory. The evaluator flags any remaining
skills/shared/
as a structural warning.

Current: Each skill owns its modules at

skills/<skill-name>/modules/
. Cross-skill references use relative paths (e.g.,
../skill-authoring/modules/anti-rationalization.md
).

Resources

Shared Modules: Cross-Skill Patterns

Skill-Specific Modules

  • Trigger Isolation Analysis: See
    modules/trigger-isolation-analysis.md
  • Skill Authoring Best Practices: See
    modules/skill-authoring-best-practices.md
  • Authoring Checklist: See
    modules/authoring-checklist.md
  • Evaluation Workflows: See
    modules/evaluation-workflows.md
  • Quality Metrics: See
    modules/quality-metrics.md
  • Advanced Tool Use Analysis: See
    modules/advanced-tool-use-analysis.md
  • Evaluation Framework: See
    modules/evaluation-framework.md
  • Integration Patterns: See
    modules/integration.md
  • Troubleshooting: See
    modules/troubleshooting.md
  • Pressure Testing: See
    modules/pressure-testing.md
  • Integration Testing: See
    modules/integration-testing.md
  • Multi-Metric Evaluation: See
    modules/multi-metric-evaluation-methodology.md
  • Performance Benchmarking: See
    modules/performance-benchmarking.md

Tools and Automation

  • Tools: Executable analysis utilities in
    scripts/
    directory.
  • Automation: Setup and validation scripts in
    scripts/automation/
    .