Claude-skill-registry lean4-prove

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/lean4-prove" ~/.claude/skills/majiayu000-claude-skill-registry-lean4-prove && rm -rf "$T"
manifest: skills/data/lean4-prove/SKILL.md
source content

lean4-prove

Retrieval-augmented Lean4 proof generation for engineering requirements. Uses 94,000+ proven theorems from DeepSeek-Prover V1+V2 to guide proof synthesis via hybrid search (BM25 + semantic + graph traversal).

Architecture

Requirement + Tactics + Persona
        │
        ▼
┌───────────────────────────┐
│ 1. RECALL similar proofs  │  ← Hybrid search on ArangoDB
│    from 94k+ exemplars    │     (BM25 + semantic + graph)
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 2. BUILD support pack     │
│    - Validated imports    │
│    - Tactic patterns      │
│    - Similar proofs       │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 3. GENERATE N candidates  │  ← Claude with exemplar context
│    constrained by corpus  │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 4. COMPILE each in        │  ← lean_runner Docker
│    lean_runner container  │
└───────────────────────────┘
        │
   ┌────┴────┐
   │         │
 Success   Failure
   │         │
   ▼         ▼
 Return   Retry with error feedback
          (up to max_retries)

Why Retrieval-Augmented?

  1. Determinism - Exact provenance: "used these 3 proofs as templates"
  2. Version alignment - Exemplars use imports that actually work
  3. Fewer hallucinations - Constrained to lemmas that exist
  4. Tactic idioms - Transfers working
    simp
    sets and proof patterns

Usage

# Basic proof
./run.sh --requirement "Prove n + 0 = n"

# With tactics preference
./run.sh -r "Prove commutativity of addition" -t "simp,ring,omega"

# With persona context
./run.sh -r "Prove message integrity" -p "cryptographer"

# Via stdin (JSON)
echo '{"requirement": "Prove n + 0 = n", "tactics": ["rfl"]}' | ./run.sh

# Custom settings
./run.sh -r "Prove theorem" --candidates 5 --retries 5 --model claude-sonnet-4-20250514

Output

{
  "success": true,
  "code": "import Mathlib\n\ntheorem list_append_length (xs ys : List α) :\n    (xs ++ ys).length = xs.length + ys.length := by\n  induction xs with\n  | nil => simp\n  | cons x xs ih => simp [List.cons_append, ih]",
  "attempts": 1,
  "candidate": 0,
  "errors": null,
  "retrieval": {
    "retrieved": 5,
    "tactics_added": ["simp", "aesop", "norm_num", "intro"],
    "imports_count": 3
  }
}

On failure:

{
  "success": false,
  "code": null,
  "attempts": 9,
  "errors": [
    "Candidate 0 attempt 1: unknown identifier 'natAdd'",
    "Candidate 1 attempt 1: type mismatch..."
  ],
  "retrieval": {
    "retrieved": 5,
    "tactics_added": ["simp", "exact"],
    "imports_count": 3
  }
}

Parameters

ParameterDefaultDescription
--requirement, -r
(required)Theorem to prove
--tactics, -t
noneComma-separated preferred tactics
--persona, -p
nonePersona context for generation
--candidates, -n
3Parallel proof candidates
--retries
3Max retries per candidate
--model
claude-sonnet-4-20250514Claude model
--container
lean_runnerDocker container name
--timeout
120Compilation timeout (seconds)

Environment Variables

# Proof generation
LEAN4_CONTAINER=lean_runner      # Docker container
LEAN4_TIMEOUT=120                # Compile timeout
LEAN4_MAX_RETRIES=3              # Retries per candidate
LEAN4_CANDIDATES=3               # Parallel candidates
LEAN4_PROVE_MODEL=opus           # Claude model (opus recommended for proofs)

# Retrieval (requires ArangoDB with ingested dataset)
LEAN4_RETRIEVAL=1                # Enable/disable retrieval (default: 1)
LEAN4_RETRIEVAL_K=5              # Number of exemplars to retrieve
ARANGO_URL=http://127.0.0.1:8529 # ArangoDB connection
ARANGO_DB=memory                 # Database name (same as memory skill)

Dataset Setup

The skill uses 94,000+ theorems from DeepSeek-Prover V1+V2 for retrieval:

  • V1: 27,503 theorems (
    status: "proven"
    )
  • V2: 66,708 theorems (
    status: "ok"
    for 11,689 proven + others)

One-time ingest:

# Ingest full dataset (~5 min)
./ingest.sh

# Or limit for testing
./ingest.sh --limit 1000

This populates the

lean_theorems
collection in ArangoDB with:

  • formal_statement
    - The theorem statement
  • formal_proof
    - Working proof code
  • header
    - Validated imports (Mathlib, Aesop, etc.)
  • tactics
    - Extracted tactic names
  • source
    - "deepseek-prover-v1" or "deepseek-prover-v2"
  • status
    - "proven" (V1) or "ok"/"failed"/etc. (V2)

Authentication

Uses Claude Code CLI (

claude -p
) in headless non-interactive mode.

The CLI is called with:

  • -p
    flag for print/headless mode
  • --output-format text
    for plain text output
  • --max-turns 1
    for single-turn operation

Environment variables

CLAUDE_CODE
and
CLAUDECODE
are cleared to avoid recursion detection when called from within Claude Code.

No separate API key required - authentication is handled via your Claude subscription.

Requirements

  1. Docker with
    lean_runner
    container running (Lean4 + Mathlib installed)
  2. Claude Code CLI (
    claude
    ) in PATH with valid authentication
  3. ArangoDB running locally (for retrieval - optional but recommended)
  4. Dataset ingested via
    ./ingest.sh
    (one-time setup)

Tactics

Common Lean4/Mathlib tactics to suggest:

TacticUse For
rfl
Reflexivity proofs
simp
Simplification
ring
Ring arithmetic
omega
Linear arithmetic
decide
Decidable propositions
exact
Exact term construction
apply
Apply lemmas
induction
Inductive proofs

Examples

Engineering: List operations

./run.sh -r "Prove length(xs ++ ys) = length(xs) + length(ys)" -t "simp,induction"

Engineering: State machine property

./run.sh -r "When mux_enable is false, output shall equal default_value" \
  -p "embedded systems engineer" -t "simp,cases,decide"

Engineering: Protocol correctness

./run.sh -r "Prove message append preserves checksum: checksum(msg ++ data) = update(checksum(msg), data)" \
  -t "simp,induction,ring"

Cryptography

./run.sh -r "Prove that XOR is self-inverse: a ⊕ a = 0" -p "cryptographer" -t "simp,decide"

Complex theorem

./run.sh -r "Prove the sum of first n natural numbers equals n*(n+1)/2" \
  -t "induction,simp,ring" \
  --candidates 5 \
  --retries 5

Difference from lean4-verify

SkillPurpose
lean4-verify
Compile-only. Takes Lean4 code, returns pass/fail
lean4-prove
Full pipeline. Takes requirement, generates + compiles + retries

Use

lean4-verify
when you already have Lean4 code to check. Use
lean4-prove
when you need to generate the proof from a requirement.

Advanced: Memory Integration

The skill integrates with the memory project for hybrid retrieval (BM25 + semantic + graph traversal):

# One-time setup: embed theorems and create edges
ARANGO_PASS=yourpass ARANGO_DB=memory python integrate_memory.py

This creates:

  • Embeddings: 39k+ theorem embeddings in
    lesson_embeddings
    for semantic search
  • Tactic edges: 10k+ edges between theorems sharing primary tactics
  • Similarity edges: Edges between semantically similar theorems (cosine > 0.7)

This enables:

  • Semantic search: Find theorems by meaning, not just keywords
  • Multi-hop traversal: "What proofs use similar tactics?"
  • Impact analysis: "If I change lemma X, what breaks?"

Lemma Dependency Graph (Planned)

Extract lemma dependencies during compilation to build an impact graph:

Query: "What's affected if mux_default changes?"

Answer:
  mux_default
    └── system_safe_state (NEEDS RE-VERIFY)
          └── requirement_42 (NEEDS RE-VERIFY)

Engineering Use Cases

  1. Ambiguity detection: Formalization fails → requirement is vague
  2. Completeness check: "Are all state transitions covered?"
  3. Regression impact: "Change to X affects theorems Y, Z"
  4. Cross-reference: "Which requirements share this lemma?"