Awesome-omni-skill response-rater

Run headless AI CLIs (Claude Code, Gemini, OpenAI Codex, Cursor Agent, GitHub Copilot) to rate an assistant response against a rubric and return actionable feedback plus a rewritten improved response; use for response quality audits, prompt/docs reviews, and "have another AI critique this answer" workflows.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ai-agents/response-rater" ~/.claude/skills/diegosouzapw-awesome-omni-skill-response-rater && rm -rf "$T"
manifest: skills/ai-agents/response-rater/SKILL.md
safety · automated scan (high risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • curl piped into shell
  • global npm install
  • makes HTTP requests (curl)
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

Response Rater

Use this skill to get independent critiques of an assistant response by calling external AI CLIs in headless mode and aggregating the results.

Supported Providers

ProviderCLI CommandAuthDefault Model
Claude Code
claude
Session or
ANTHROPIC_API_KEY
(CLI default)
Gemini CLI
gemini
Session or
GEMINI_API_KEY
/
GOOGLE_API_KEY
gemini-3-pro-preview
OpenAI Codex
codex
OPENAI_API_KEY
or
CODEX_API_KEY
gpt-5.1-codex-max
Cursor Agent
cursor-agent
(via WSL on Windows)
Session or
CURSOR_API_KEY
auto
GitHub Copilot
copilot
GitHub auth (
gh auth login
)
claude-sonnet-4.5

Quick Start

  1. Save the response you want reviewed to a file (or pipe it via stdin).

  2. Ensure you have at least one provider authenticated (see table above).

  3. Run:

node .claude/skills/response-rater/scripts/rate.cjs --response-file <path> --providers claude,gemini

Or via stdin (PowerShell):

Get-Content response.txt | node .claude/skills/response-rater/scripts/rate.cjs --providers claude,gemini

Or via stdin (Bash):

cat response.txt | node .claude/skills/response-rater/scripts/rate.cjs --providers claude,gemini

All Providers Example

node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file response.txt \
  --providers claude,gemini,codex,cursor,copilot

Model Selection

Each provider has a configurable model:

# Use specific models for each provider
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file response.txt \
  --providers gemini,codex,copilot \
  --gemini-model gemini-2.5-pro \
  --codex-model gpt-5.1-codex \
  --copilot-model gpt-5

Available Models by Provider

Gemini CLI:

  • gemini-3-pro-preview
    (default, latest flagship)
  • gemini-3-flash-preview
    (fast, latest)
  • gemini-2.5-pro
    (stable pro)
  • gemini-2.5-flash
    (stable fast)
  • gemini-2.5-flash-lite
    (lightweight)

OpenAI Codex:

  • gpt-5.1-codex-max
    (default, flagship)
  • gpt-5.1-codex
    (standard)
  • gpt-5.1-codex-mini
    (faster/cheaper)

Cursor Agent:

  • auto
    (default, smart selection)
  • gpt-5.1-high
    ,
    gpt-5.1-codex-high
  • opus-4.5
    ,
    sonnet-4.5
  • gemini-3-pro

GitHub Copilot:

  • claude-sonnet-4.5
    (default)
  • claude-opus-4.5
    ,
    claude-haiku-4.5
    ,
    claude-sonnet-4
  • gpt-5.1-codex-max
    ,
    gpt-5.1-codex
    ,
    gpt-5.1-codex-mini
  • gpt-5.2
    ,
    gpt-5.1
    ,
    gpt-5
    ,
    gpt-5-mini
    ,
    gpt-4.1
  • gemini-3-pro-preview

Templates

# Response review (default) - critique against rubric
--template response-review

# Vocabulary review - security audit for LLM vocabulary files
--template vocab-review

Vocabulary Review Example

Get-Content vocabulary.json | node .claude/skills/response-rater/scripts/rate.cjs \
  --providers claude,gemini \
  --template vocab-review

Auth Behavior

By default the runner uses

--auth-mode session-first
:

  1. Try the CLI using your existing logged-in session/subscription
  2. If that fails and env keys exist, retry using env keys

To flip the order:

node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file response.txt \
  --auth-mode env-first \
  --providers claude,gemini

Direct Headless Commands (No Script)

Claude Code:

Get-Content response.txt | claude -p --output-format json --permission-mode bypassPermissions

Gemini CLI:

Get-Content response.txt | gemini --output-format json --model gemini-3-pro-preview

OpenAI Codex:

codex exec --json --color never --model gpt-5.1-codex-max --skip-git-repo-check "Your prompt"

Cursor Agent (via WSL on Windows):

wsl bash -lc "cursor-agent -p 'Your prompt' --output-format json --model auto"

GitHub Copilot:

copilot -p --silent --no-color --model claude-sonnet-4.5 "Your prompt"

Output Format

JSON to stdout with:

  • promptVersion
    : Schema version
  • template
    : Template used
  • authMode
    : Auth mode used
  • providers
    : Object with per-provider results:
    • ok
      : Boolean success
    • authUsed
      : Which auth method worked
    • raw
      : Truncated raw output
    • parsed
      : Extracted JSON with
      scores
      ,
      summary
      ,
      improvements
      ,
      rewrite
    • attempts
      : Auth attempt history

Sample Output

{
  "promptVersion": 3,
  "template": "response-review",
  "authMode": "session-first",
  "providers": {
    "claude": {
      "ok": true,
      "authUsed": "session",
      "parsed": {
        "scores": {
          "correctness": 8,
          "completeness": 7,
          "clarity": 9,
          "actionability": 8,
          "risk_management": 6,
          "constraint_alignment": 8,
          "brevity": 7
        },
        "summary": "The response is well-structured...",
        "improvements": ["Add error handling for...", "..."],
        "rewrite": "Improved version..."
      }
    },
    "gemini": { "ok": true, "..." },
    "codex": { "ok": true, "..." }
  }
}

Notes / Constraints

  • This skill prefers network access to contact AI providers for best results.
  • Offline Fallback: If all providers fail due to network issues, automatic offline heuristic scoring activates.
  • If a provider is missing credentials, it will be skipped with a clear error.
  • Keep the reviewed response reasonably sized; start with the exact section you want critiqued.
  • Timeout is 180 seconds per provider (3 minutes).

Offline Fallback Mode

Overview

When network access is unavailable, the response-rater automatically falls back to offline heuristic scoring. This enables plan rating in air-gapped environments, CI/CD pipelines without network, and development environments with intermittent connectivity.

How It Works

Automatic Detection:

  1. All configured providers are attempted (e.g.,
    claude,gemini
    )
  2. If all providers fail with network errors (ENOTFOUND, ETIMEDOUT, ECONNREFUSED)
  3. Offline fallback activates automatically (no manual intervention required)

Scoring Algorithm:

The offline rater uses structural analysis to score plans across 5 dimensions:

DimensionWeightScoring Method
Completeness20%Checks for required sections: objectives, context, steps, etc.
Feasibility20%Analyzes time estimates, dependencies, resource requirements
Risk Mitigation20%Counts identified risks and validates mitigation strategies
Agent Coverage20%Verifies agent assignments and diversity of agent types
Integration20%Checks integration points, data flow, API contracts

Overall Score: Average of all 5 dimensions (equal weights)

Offline vs Online Scoring

FeatureOnline (AI Providers)Offline (Heuristic)
Network Required✅ Yes❌ No
Scoring MethodAI semantic analysisStructural heuristics
AccuracyHigh (95%+)Good (85-90%)
Speed10-60 seconds<1 second
Improvement FeedbackDetailed, actionableTemplate-based
Minimum Score7/10 (standard)7/10 (same threshold)
Score ToleranceExact scores±1 point variance

Usage

Automatic (Recommended):

# Try online providers first; fallback to offline if network unavailable
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file plan.json \
  --providers claude,gemini

Explicit Offline (Force offline mode):

# Use offline rater directly (bypasses network attempt)
node .claude/skills/response-rater/scripts/offline-rater.mjs plan.json

Output Format (Offline Fallback)

When offline fallback activates, output includes:

{
  "promptVersion": 3,
  "template": "plan-review",
  "authMode": "session-first",
  "method": "offline",
  "offline_fallback": true,
  "providers": {
    "claude": { "ok": false, "error": "ETIMEDOUT" },
    "gemini": { "ok": false, "error": "ENOTFOUND" }
  },
  "offline_rating": {
    "ok": true,
    "method": "offline",
    "duration_ms": 45,
    "scores": {
      "completeness": 8,
      "feasibility": 7,
      "risk_mitigation": 6,
      "agent_coverage": 9,
      "integration": 7
    },
    "overall_score": 7.4,
    "summary": "Plan scored 7.4/10 using offline heuristic analysis...",
    "improvements": [
      "Add missing plan sections: objectives, context, success criteria",
      "Include time estimates and resource requirements for each step"
    ],
    "note": "Offline scoring uses heuristic analysis. For production use, prefer online scoring with AI providers."
  }
}

Testing Offline Mode

Run test suite to verify offline scoring:

node .claude/skills/response-rater/tests/offline-scoring.test.mjs

Test Coverage:

  • High-quality plan (should score >= 7)
  • Low-quality plan (should score < 7)
  • Medium-quality plan (should score ~7)
  • Performance test (<1 second)
  • Invalid plan handling
  • Improvement suggestion generation

Limitations

Offline mode cannot:

  • Perform semantic analysis (detects structure, not content quality)
  • Validate business logic correctness
  • Detect subtle plan flaws (e.g., circular dependencies)
  • Provide nuanced improvement suggestions

Offline mode is suitable for:

  • Air-gapped environments (no network access)
  • CI/CD pipelines without external network
  • Development environments with intermittent connectivity
  • Emergency situations requiring immediate plan validation

Recommendation: Use online scoring for production workflows; offline mode is a fallback only.


Installation Requirements

Install the CLIs you want to use:

# Claude Code (via npm)
npm install -g @anthropic-ai/claude-code

# Gemini CLI (via npm)
npm install -g @anthropic-ai/gemini

# OpenAI Codex (via npm)
npm install -g @openai/codex

# Cursor Agent (via curl, inside WSL on Windows)
wsl bash -lc "curl https://cursor.com/install -fsS | bash"

# GitHub Copilot (via npm)
npm install -g @github/copilot

Note: Offline fallback requires no installation (built-in heuristic scoring).

Skill Invocation

Natural language (recommended):

"Rate this response against the rubric"
"Have another AI critique this answer"
"Review the vocabulary file for security issues"
"Get feedback from Claude, Gemini, and Codex on this response"

Direct CLI:

node .claude/skills/response-rater/scripts/rate.cjs --response-file response.txt --providers claude,gemini,codex,cursor,copilot

CLI Reference

Usage:
  node .claude/skills/response-rater/scripts/rate.cjs --response-file <path> [options]
  cat response.txt | node .claude/skills/response-rater/scripts/rate.cjs [options]

Options:
  --response-file <path>   # file containing the response to review
  --question-file <path>   # optional; original question/request for context
  --providers <list>       # comma-separated: claude,gemini,codex,cursor,copilot

Models:
  --gemini-model <model>   # default: gemini-3-pro-preview
  --codex-model <model>    # default: gpt-5.1-codex-max
  --cursor-model <model>   # default: auto
  --copilot-model <model>  # default: claude-sonnet-4.5

Templates:
  --template response-review   # default
  --template vocab-review      # security audit

Auth:
  --auth-mode session-first   # default
  --auth-mode env-first       # try env keys first

Plan Rating for Orchestration

Overview

The response-rater skill is mandatory for orchestrators to validate plan quality before workflow execution. This ensures all plans meet minimum quality standards and reduces execution failures.

Plan Rating Usage for Orchestrators

When to Use: After Planner creates a plan, before workflow execution starts

Workflow Pattern:

  1. Planner creates
    plan-{{workflow_id}}.json
  2. Orchestrator invokes response-rater skill with plan content
  3. Response-rater evaluates plan using rubric
  4. If score >= 7/10: Proceed with execution
  5. If score < 7/10: Return to Planner with feedback (max 3 attempts)

Command:

node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file .claude/context/artifacts/plan-{{workflow_id}}.json \
  --providers claude,gemini \
  --template plan-review

Rubric Dimensions

Plans are evaluated on 5 key dimensions (equal weight):

DimensionWeightDescription
completeness20%All required information present, no gaps
feasibility20%Plan is realistic and achievable
risk_mitigation20%Risks identified and mitigation strategies defined
agent_coverage20%Appropriate agents assigned to each task
integration20%Plan integrates properly with existing systems

Overall Score: Average of all 5 dimensions

Minimum Score: 7/10 (standard), 8/10 (enterprise), 9/10 (critical)

Feedback Format

Response-rater returns structured JSON with:

{
  "scores": {
    "completeness": 8,
    "feasibility": 7,
    "risk_mitigation": 6,
    "agent_coverage": 9,
    "integration": 7
  },
  "overall_score": 7.4,
  "summary": "Plan is generally solid with strong agent coverage...",
  "improvements": [
    "Add explicit error handling strategy for each phase",
    "Define fallback agents for critical steps",
    "Include rollback procedures for failed steps"
  ],
  "rewrite": "Improved plan with enhanced risk mitigation..."
}

Minimum Scores by Task Type

Task TypeMinimum ScoreExample
Emergency5/10Production outages, critical hotfixes
Standard7/10Regular features, bug fixes (default)
Enterprise8/10Enterprise integrations, migrations
Critical9/10Security, compliance, data protection

Provider Configuration for Plan Rating

Default Providers

Standard plan rating uses 2 providers for consensus:

--providers claude,gemini

Enterprise plan rating uses 3 providers:

--providers claude,gemini,codex

Critical plan rating uses all 5 providers:

--providers claude,gemini,codex,cursor,copilot

Provider Selection Strategy

ScenarioProvidersRationale
Standard plansclaude,geminiFast, reliable, good consensus
Enterprise plansclaude,gemini,codexAdditional perspective for complex plans
Critical plansAll 5Maximum validation for mission-critical work

Timeout and Failure Handling

Timeout Configuration

Default timeout: 180 seconds per provider (3 minutes)

Rationale: Plan rating requires deep analysis; allow sufficient time

Increase timeout for complex plans:

# Enterprise migration plan (5 minutes per provider)
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file plan-enterprise-migration.json \
  --providers claude,gemini,codex \
  --timeout 300000

# Critical security audit plan (10 minutes per provider)
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file plan-security-audit.json \
  --providers claude,gemini,codex,cursor,copilot \
  --timeout 600000

Failure Handling

Scenario 1: One provider fails

  • Action: Continue with successful providers
  • Minimum: Require at least 1 successful provider
  • Log: Record failure in reasoning file

Scenario 2: All providers fail

  • Action: Retry with exponential backoff (3 attempts)
  • If still failing: Escalate to user
  • Options: Manual review, skip rating (force-proceed), cancel workflow

Scenario 3: Provider timeout

  • Action: Retry provider once
  • If timeout persists: Skip provider, continue with others
  • Log: Record timeout in reasoning file

Scenario 4: Authentication failure

  • Action: Retry with environment-based auth (if available)
  • If auth still fails: Skip provider, log error
  • Minimum: Require at least 1 authenticated provider

Workflow Integration for Plan Rating

Step 0.1: Plan Rating Gate (Standard Pattern)

All workflows include a plan rating gate after planning:

steps:
  - step: 0.1
    name: 'Plan Rating Gate'
    agent: orchestrator
    type: validation
    skill: response-rater
    inputs:
      - plan-{{workflow_id}}.json (from step 0)
    outputs:
      - .claude/context/runs/<run_id>/plans/<plan_id>-rating.json
      - reasoning: .claude/context/history/reasoning/{{workflow_id}}/00.1-orchestrator.json
    validation:
      minimum_score: 7
      rubric_file: .claude/context/artifacts/standard-plan-rubric.json
      gate: .claude/context/history/gates/{{workflow_id}}/00.1-orchestrator.json
    retry:
      max_attempts: 3
      on_failure: escalate_to_human
    description: |
      Rate plan quality using response-rater skill.
      - Minimum passing score: 7/10
      - If score < 7: Return to Planner with feedback
      - If score >= 7: Proceed with workflow execution

Rating File Location

Standard path:

.claude/context/runs/<run_id>/plans/<plan_id>-rating.json

Example:

.claude/context/runs/run-001/plans/plan-greenfield-2025-01-06-rating.json

Orchestrator Responsibilities

  1. After Planner completes: Invoke response-rater skill with plan content
  2. Process rating results: Extract overall score and rubric scores
  3. Save rating: Persist to standard path for audit trail
  4. Decision logic:
    • If score >= minimum: Proceed to next step
    • If score < minimum: Return to Planner with feedback (max 3 attempts)
    • If 3 failures: Escalate to user for manual review

Retry Logic

  1. Attempt 1: Rate → if < 7 → return to Planner with feedback
  2. Attempt 2: Planner revises → re-rate → if < 7 → return again
  3. Attempt 3: Final revision → re-rate → if < 7 → escalate to user

Force-proceed: User can approve execution despite low score (requires explicit acknowledgment of risks)


Timeout Escalation

When a provider times out, the response-rater follows this escalation pattern:

Escalation Flow

  1. Primary Provider Timeout (after 180s)

    • Log timeout with provider name
    • Mark provider as failed for this request
    • Immediately try next provider in list
  2. Secondary Provider Timeout

    • Log cumulative timeout
    • If more providers available, try next
    • If total time > 600s, stop and use available results
  3. All Providers Timeout

    • Check for cached rating (< 1 hour old)
    • If cached: Use cached rating with warning
    • If no cache: Escalate to manual review

Configuration

Timeouts are configured in

.claude/config/response-rater.yaml
:

timeouts:
  per_provider: 180 # Per-provider timeout
  total_max: 600 # Max total time
  connection: 30 # Connection timeout

Example Timeout Scenario

Provider: claude (timeout after 180s) → FAILED
Provider: gemini (success in 45s) → Score: 7.2
Provider: codex (timeout after 180s) → FAILED

Result: Score 7.2 from gemini (single provider)
Warning: "2 of 3 providers timed out"

Fallback Behavior

fallback:
  on_all_fail: use_cached_or_manual_review
  cache_ttl: 3600 # Use cache if < 1 hour old
  manual_review_enabled: true

When all providers fail:

  1. Check rating cache for identical plan hash
  2. If valid cache exists, return cached rating with
    source: cache
    flag
  3. If no cache, return
    { status: "manual_review_required" }
  4. Log escalation for monitoring

Provider Selection by Workflow

Response-rater automatically selects providers based on workflow criticality:

Workflow TypeTierProvidersRationale
incident-flowstandardclaude, geminiFast response for emergencies
quick-flowstandardclaude, geminiQuick validation for minor changes
enterprise-trackenterpriseclaude, gemini, codexEnhanced validation for enterprise
automated-enterprise-flowenterpriseclaude, gemini, codexEnterprise compliance requirements
legacy-modernization-flowenterpriseclaude, gemini, codexComplex migration validation
ai-system-flowcriticalall 5 providersMaximum validation for AI systems
security-flowcriticalall 5 providersCritical security validation
compliance-flowcriticalall 5 providersRegulatory compliance requirements

Provider selection tool:

node .claude/tools/response-rater-provider-selector.mjs --workflow <name>

Security

API Key Protection

This skill implements defense-in-depth for credential protection following OWASP A02:2021 guidelines.

Automatic Sanitization:

  • All error messages are automatically sanitized to prevent API key leakage
  • Stack traces and stderr output are filtered before logging
  • Environment variable values are redacted in error contexts
  • Provider authentication failures never expose actual credentials

Sanitized Patterns:

  • Anthropic API keys (
    sk-ant-...
    )
  • OpenAI API keys (
    sk-...
    )
  • Google/Gemini API keys (
    AIza...
    )
  • GitHub tokens (
    ghp_
    ,
    gho_
    ,
    ghu_
    ,
    ghs_
    ,
    ghr_
    )
  • JWT tokens
  • Authorization headers (Bearer, Basic)
  • Environment variable assignments (
    KEY=value
    )
  • Long alphanumeric tokens (40+ characters)

Best Practices:

  1. Use session-first authentication - Avoids storing API keys in environment
  2. Never commit API keys - Use environment variables or secret managers
  3. Rotate keys regularly - Especially after suspected exposure
  4. Monitor logs for
    [REDACTED]
    markers
    - Indicates potential key exposure attempt was blocked
  5. Use CI secrets - Store keys in CI/CD secret management, not in code

Implementation:

// Error handling in rate.js automatically sanitizes
const { sanitize } = require('../../shared/sanitize-secrets.js');
console.error(`fatal: ${sanitize(error.message)}`); // Safe to log

Security Audit:

  • Run
    node .claude/tools/test-sanitization.mjs
    to verify sanitization patterns
  • All 15 test cases must pass before deployment

References

  • Plan Rating Guide:
    .claude/docs/PLAN_RATING_GUIDE.md
    (comprehensive documentation)
  • Enforcement Gate:
    .claude/tools/enforcement-gate.mjs
    (validation CLI)
  • Plan Review Matrix:
    .claude/context/plan-review-matrix.json
    (minimum scores by task type)
  • Workflow Guide:
    .claude/workflows/WORKFLOW-GUIDE.md
    (workflow integration)
  • Provider Config:
    .claude/config/response-rater.yaml
    (provider selection and timeouts)