Claude-Skills senior-prompt-engineer

install

source · Clone the upstream repo

git clone https://github.com/borghei/Claude-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/engineering/senior-prompt-engineer" ~/.claude/skills/borghei-claude-skills-senior-prompt-engineer && rm -rf "$T"

manifest: engineering/senior-prompt-engineer/SKILL.md

source content

Senior Prompt Engineer

Prompt engineering patterns, LLM evaluation frameworks, and agentic system design.

Quick Start
Tools Overview
Prompt Engineering Workflows
Reference Documentation
Common Patterns Quick Reference

Quick Start

# Analyze and optimize a prompt file
python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze

# Evaluate RAG retrieval quality
python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json

# Visualize agent workflow from definition
python scripts/agent_orchestrator.py agent_config.yaml --visualize

Tools Overview

1. Prompt Optimizer

Analyzes prompts for token efficiency, clarity, and structure. Generates optimized versions.

Input: Prompt text file or string Output: Analysis report with optimization suggestions

Usage:

# Analyze a prompt file
python scripts/prompt_optimizer.py prompt.txt --analyze

# Output:
# Token count: 847
# Estimated cost: $0.0025 (GPT-4)
# Clarity score: 72/100
# Issues found:
#   - Ambiguous instruction at line 3
#   - Missing output format specification
#   - Redundant context (lines 12-15 repeat lines 5-8)
# Suggestions:
#   1. Add explicit output format: "Respond in JSON with keys: ..."
#   2. Remove redundant context to save 89 tokens
#   3. Clarify "analyze" -> "list the top 3 issues with severity ratings"

# Generate optimized version
python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt

# Count tokens for cost estimation
python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4

# Extract and manage few-shot examples
python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

2. RAG Evaluator

Evaluates Retrieval-Augmented Generation quality by measuring context relevance and answer faithfulness.

Input: Retrieved contexts (JSON) and questions/answers Output: Evaluation metrics and quality report

Usage:

# Evaluate retrieval quality
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json

# Output:
# === RAG Evaluation Report ===
# Questions evaluated: 50
#
# Retrieval Metrics:
#   Context Relevance: 0.78 (target: >0.80)
#   Retrieval Precision@5: 0.72
#   Coverage: 0.85
#
# Generation Metrics:
#   Answer Faithfulness: 0.91
#   Groundedness: 0.88
#
# Issues Found:
#   - 8 questions had no relevant context in top-5
#   - 3 answers contained information not in context
#
# Recommendations:
#   1. Improve chunking strategy for technical documents
#   2. Add metadata filtering for date-sensitive queries

# Evaluate with custom metrics
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
    --metrics relevance,faithfulness,coverage

# Export detailed results
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
    --output report.json --verbose

3. Agent Orchestrator

Parses agent definitions and visualizes execution flows. Validates tool configurations.

Input: Agent configuration (YAML/JSON) Output: Workflow visualization, validation report

Usage:

# Validate agent configuration
python scripts/agent_orchestrator.py agent.yaml --validate

# Output:
# === Agent Validation Report ===
# Agent: research_assistant
# Pattern: ReAct
#
# Tools (4 registered):
#   [OK] web_search - API key configured
#   [OK] calculator - No config needed
#   [WARN] file_reader - Missing allowed_paths
#   [OK] summarizer - Prompt template valid
#
# Flow Analysis:
#   Max depth: 5 iterations
#   Estimated tokens/run: 2,400-4,800
#   Potential infinite loop: No
#
# Recommendations:
#   1. Add allowed_paths to file_reader for security
#   2. Consider adding early exit condition for simple queries

# Visualize agent workflow (ASCII)
python scripts/agent_orchestrator.py agent.yaml --visualize

# Output:
# ┌─────────────────────────────────────────┐
# │            research_assistant           │
# │              (ReAct Pattern)            │
# └─────────────────┬───────────────────────┘
#                   │
#          ┌────────▼────────┐
#          │   User Query    │
#          └────────┬────────┘
#                   │
#          ┌────────▼────────┐
#          │     Think       │◄──────┐
#          └────────┬────────┘       │
#                   │                │
#          ┌────────▼────────┐       │
#          │   Select Tool   │       │
#          └────────┬────────┘       │
#                   │                │
#     ┌─────────────┼─────────────┐  │
#     ▼             ▼             ▼  │
# [web_search] [calculator] [file_reader]
#     │             │             │  │
#     └─────────────┼─────────────┘  │
#                   │                │
#          ┌────────▼────────┐       │
#          │    Observe      │───────┘
#          └────────┬────────┘
#                   │
#          ┌────────▼────────┐
#          │  Final Answer   │
#          └─────────────────┘

# Export workflow as Mermaid diagram
python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid

Prompt Engineering Workflows

Prompt Optimization Workflow

Use when improving an existing prompt's performance or reducing token costs.

Step 1: Baseline current prompt

python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json

Step 2: Identify issues Review the analysis report for:

Token waste (redundant instructions, verbose examples)
Ambiguous instructions (unclear output format, vague verbs)
Missing constraints (no length limits, no format specification)

Step 3: Apply optimization patterns

Issue	Pattern to Apply
Ambiguous output	Add explicit format specification
Too verbose	Extract to few-shot examples
Inconsistent results	Add role/persona framing
Missing edge cases	Add constraint boundaries

Step 4: Generate optimized version

python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt

Step 5: Compare results

python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
# Shows: token reduction, clarity improvement, issues resolved

Step 6: Validate with test cases Run both prompts against your evaluation set and compare outputs.

Few-Shot Example Design Workflow

Use when creating examples for in-context learning.

Step 1: Define the task clearly

Task: Extract product entities from customer reviews
Input: Review text
Output: JSON with {product_name, sentiment, features_mentioned}

Step 2: Select diverse examples (3-5 recommended)

Example Type	Purpose
Simple case	Shows basic pattern
Edge case	Handles ambiguity
Complex case	Multiple entities
Negative case	What NOT to extract

Step 3: Format consistently

Example 1:
Input: "Love my new iPhone 15, the camera is amazing!"
Output: {"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}

Example 2:
Input: "The laptop was okay but battery life is terrible."
Output: {"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}

Step 4: Validate example quality

python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples
# Checks: consistency, coverage, format alignment

Step 5: Test with held-out cases Ensure model generalizes beyond your examples.

Structured Output Design Workflow

Use when you need reliable JSON/XML/structured responses.

Step 1: Define schema

{
  "type": "object",
  "properties": {
    "summary": {"type": "string", "maxLength": 200},
    "sentiment": {"enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}

Step 2: Include schema in prompt

Respond with JSON matching this schema:
- summary (string, max 200 chars): Brief summary of the content
- sentiment (enum): One of "positive", "negative", "neutral"
- confidence (number 0-1): Your confidence in the sentiment

Step 3: Add format enforcement

IMPORTANT: Respond ONLY with valid JSON. No markdown, no explanation.
Start your response with { and end with }

Step 4: Validate outputs

python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json

Reference Documentation

File	Contains	Load when user asks about
`references/prompt_engineering_patterns.md`	10 prompt patterns with input/output examples	"which pattern?", "few-shot", "chain-of-thought", "role prompting"
`references/llm_evaluation_frameworks.md`	Evaluation metrics, scoring methods, A/B testing	"how to evaluate?", "measure quality", "compare prompts"
`references/agentic_system_design.md`	Agent architectures (ReAct, Plan-Execute, Tool Use)	"build agent", "tool calling", "multi-agent"

Common Patterns Quick Reference

Pattern	When to Use	Example
Zero-shot	Simple, well-defined tasks	"Classify this email as spam or not spam"
Few-shot	Complex tasks, consistent format needed	Provide 3-5 examples before the task
Chain-of-Thought	Reasoning, math, multi-step logic	"Think step by step..."
Role Prompting	Expertise needed, specific perspective	"You are an expert tax accountant..."
Structured Output	Need parseable JSON/XML	Include schema + format enforcement

Common Commands

# Prompt Analysis
python scripts/prompt_optimizer.py prompt.txt --analyze          # Full analysis
python scripts/prompt_optimizer.py prompt.txt --tokens           # Token count only
python scripts/prompt_optimizer.py prompt.txt --optimize         # Generate optimized version

# RAG Evaluation
python scripts/rag_evaluator.py --contexts ctx.json --questions q.json  # Evaluate
python scripts/rag_evaluator.py --contexts ctx.json --compare baseline  # Compare to baseline

# Agent Development
python scripts/agent_orchestrator.py agent.yaml --validate       # Validate config
python scripts/agent_orchestrator.py agent.yaml --visualize      # Show workflow
python scripts/agent_orchestrator.py agent.yaml --estimate-cost  # Token estimation

Troubleshooting

Problem	Cause	Solution
Token count seems inaccurate	Character-based estimation varies by language and special characters	Use `--model` flag matching your target model; Claude uses a 3.5 char/token ratio vs 4.0 for GPT models
Clarity score is low despite clear prompt	Vague-pattern detector flags common words like "analyze" or "some" even in valid contexts	Review flagged lines individually; not every match is a true issue --- focus on genuinely ambiguous instructions
Few-shot examples not detected	Examples do not follow the `Input:/Output:` or `Example N:` labeling convention	Format examples with explicit `Input:` and `Output:` prefixes so the extractor can parse them
RAG evaluator shows 0.0 for all metrics	Input JSON schema mismatch --- missing `question` , `content` , or `question_id` keys	Verify JSON uses the expected keys ( `question` / `query` , `content` / `text` , `question_id` / `query_id` )
Agent YAML parsing fails	Built-in YAML parser is simplified and cannot handle advanced syntax (anchors, multi-line blocks)	Convert config to JSON, or restructure YAML to use only simple key-value pairs and dash-prefixed lists
Optimization produces minimal changes	`--optimize` only performs whitespace normalization, not semantic rewriting	Use `--analyze` first to get suggestions, then manually apply structural improvements before re-running `--optimize`
Mermaid diagram renders incorrectly	More than 6 tools overflow the generated subgraph	Reduce tool count in the config or manually edit the Mermaid output to split into sub-diagrams

Success Criteria

Prompt clarity score above 70/100 on all production prompts, measured via
```
prompt_optimizer.py --analyze
```
Token efficiency improved by 30%+ after applying optimization suggestions and removing redundant content
RAG context relevance at or above 0.80 across evaluation sets, verified by
```
rag_evaluator.py
```
Answer faithfulness at or above 0.95 with zero unsupported claims in critical workflows
Agent validation passes with zero errors for all deployed agent configurations
Cost per agent run within budget --- estimated monthly spend confirmed via
```
agent_orchestrator.py --estimate-cost
```
Few-shot example coverage includes edge cases --- at least 1 simple, 1 complex, and 1 negative example per prompt template

Scope & Limitations

This skill covers:

Static prompt analysis: token counting, clarity scoring, structure detection, and optimization suggestions
RAG evaluation: context relevance, answer faithfulness, groundedness, and retrieval metrics (Precision@K, ROUGE-L, MRR, NDCG)
Agent workflow design: configuration validation, ASCII/Mermaid visualization, and token cost estimation
Few-shot example extraction and management from existing prompts

This skill does NOT cover:

Live LLM calls or runtime prompt testing --- all analysis is static/deterministic (see
```
senior-ml-engineer
```
for LLM integration)
Vector database setup or embedding generation --- RAG evaluator scores pre-retrieved contexts only (see
```
senior-data-engineer
```
for pipeline orchestration)
Fine-tuning, RLHF, or model training workflows (see
```
senior-ml-engineer
```
for model deployment)
Production monitoring, A/B test execution, or real-time drift detection (see
```
senior-data-scientist
```
for experiment design)

Integration Points

Skill	Integration	Data Flow
`senior-ml-engineer`	LLM integration and model deployment	Optimized prompts from this skill feed into `llm_integration_builder.py` prompt templates
`senior-data-scientist`	A/B test design for prompt experiments	`experiment_designer.py` defines test parameters; this skill provides the prompt variants to compare
`senior-data-engineer`	RAG pipeline orchestration	`pipeline_orchestrator.py` builds the retrieval pipeline; this skill evaluates its output quality
`senior-fullstack`	End-to-end application scaffolding	Fullstack apps consume agent configs validated by `agent_orchestrator.py`
`senior-security`	Prompt injection and adversarial input review	Security analysis covers the attack surface; this skill ensures prompts include defensive constraints
`senior-qa`	Quality assurance for AI-powered features	QA test suites validate that optimized prompts produce consistent outputs in production

Tool Reference

prompt_optimizer.py

Purpose: Static analysis tool for prompt engineering. Estimates token counts, scores clarity and structure, detects ambiguous instructions and redundant content, extracts few-shot examples, and generates optimized prompt versions.

Usage:

python scripts/prompt_optimizer.py <prompt_file> [options]

Parameters:

Flag	Short	Type	Default	Description
`prompt`	(positional)	string	(required)	Path to the prompt text file to analyze
`--analyze`	`-a`	flag	off	Run full analysis (clarity, structure, issues, suggestions)
`--tokens`	`-t`	flag	off	Count tokens and estimate cost only
`--optimize`	`-O`	flag	off	Generate whitespace-optimized version of the prompt
`--extract-examples`	`-e`	flag	off	Extract few-shot examples (Input/Output pairs) as JSON
`--model`	`-m`	choice	`gpt-4`	Model for token/cost estimation. Choices: `gpt-4` , `gpt-4-turbo` , `gpt-3.5-turbo` , `claude-3-opus` , `claude-3-sonnet` , `claude-3-haiku`
`--output`	`-o`	string	(none)	Write results to this file path
`--json`	`-j`	flag	off	Output analysis as JSON instead of human-readable report
`--compare`	`-c`	string	(none)	Path to a baseline analysis JSON file for comparison

Example:

python scripts/prompt_optimizer.py prompt.txt --analyze --model claude-3-sonnet --json

Output Formats:

Default (text): Human-readable report with metrics, scores, detected sections, issues, and suggestions

JSON (
--json
): Structured

PromptAnalysis

object with keys:

token_count

estimated_cost

model

clarity_score

structure_score

issues

suggestions

sections

has_examples

example_count

has_output_format

word_count

line_count

Token-only (
--tokens
): Single-line token count and cost estimate

Examples (
--extract-examples
): JSON array of

{input_text, output_text, index}

objects

Optimized (
--optimize
): Cleaned prompt text with normalized whitespace

rag_evaluator.py

Purpose: Evaluates Retrieval-Augmented Generation quality by measuring context relevance (lexical overlap, term coverage), answer faithfulness (claim-level verification), groundedness (ROUGE-L), and retrieval metrics (Precision@K, MRR, NDCG).

Usage:

python scripts/rag_evaluator.py --contexts <contexts.json> --questions <questions.json> [options]

Parameters:

Flag	Short	Type	Default	Description
`--contexts`	`-c`	string	(required)	Path to JSON file with retrieved contexts. Expected keys per object: `question_id` / `query_id` , `content` / `text`
`--questions`	`-q`	string	(required)	Path to JSON file with questions and answers. Expected keys per object: `id` , `question` / `query` , `answer` / `response` , `expected` / `ground_truth`
`--k`		int	`5`	Number of top contexts to evaluate per question
`--output`	`-o`	string	(none)	Write detailed report to this JSON file
`--json`	`-j`	flag	off	Output as JSON instead of human-readable text
`--verbose`	`-v`	flag	off	Include per-question detail breakdowns in the report
`--compare`		string	(none)	Path to a baseline report JSON for metric comparison

Example:

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json --k 10 --verbose --output report.json

Output Formats:

Default (text): Human-readable report with summary, retrieval metrics (context relevance, Precision@K), generation metrics (faithfulness, groundedness), issues, and recommendations

JSON (
--json
): Structured

RAGEvaluationReport

object with keys:

total_questions

avg_context_relevance

avg_faithfulness

avg_groundedness

retrieval_metrics

coverage

issues

recommendations

question_details

Verbose (
--verbose
): Adds per-question
```
question_details
```
array containing individual context scores and faithfulness breakdowns

agent_orchestrator.py

Purpose: Parses agent configurations (YAML or JSON), validates tool registrations and flow correctness, generates ASCII or Mermaid workflow diagrams, and estimates token costs per run and monthly spend.

Usage:

python scripts/agent_orchestrator.py <config_file> [options]

Parameters:

Flag	Short	Type	Default	Description
`config`	(positional)	string	(required)	Path to agent configuration file (YAML or JSON)
`--validate`	`-V`	flag	off	Validate agent configuration (errors, warnings, tool status). Runs by default if no other action is specified
`--visualize`	`-v`	flag	off	Generate workflow diagram
`--format`	`-f`	choice	`ascii`	Visualization format. Choices: `ascii` , `mermaid`
`--estimate-cost`	`-e`	flag	off	Estimate token usage and costs
`--runs`	`-r`	int	`100`	Daily run count for monthly cost projection
`--output`	`-o`	string	(none)	Write output to this file path
`--json`	`-j`	flag	off	Output validation and cost results as JSON

Example:

python scripts/agent_orchestrator.py agent.yaml --validate --visualize --format mermaid --output workflow.md

Output Formats:

Validation (text): Agent info, tool status with OK/WARN indicators, flow analysis (max iterations, token estimate, loop detection), errors, and warnings

Validation (JSON,
--json
): Structured

ValidationResult

object with keys:

is_valid

errors

warnings

tool_status

estimated_tokens_per_run

potential_infinite_loop

max_depth

Visualization (
--visualize
): ASCII box-drawing diagram (default) or Mermaid flowchart (
```
--format mermaid
```
) showing the agent pattern flow and registered tools
Cost estimation (
--estimate-cost
): Token range per run, cost range per run, and projected monthly cost at the specified daily run rate

Claude-Skills senior-prompt-engineer

Senior Prompt Engineer

Table of Contents

Quick Start

Tools Overview

1. Prompt Optimizer

2. RAG Evaluator

3. Agent Orchestrator

Prompt Engineering Workflows

Prompt Optimization Workflow

Few-Shot Example Design Workflow

Structured Output Design Workflow

Reference Documentation

Common Patterns Quick Reference

Common Commands

Troubleshooting

Success Criteria

Scope & Limitations

Integration Points

Tool Reference

prompt_optimizer.py

rag_evaluator.py

agent_orchestrator.py