Claude-skill-registry lean4-prove

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/lean4-prove" ~/.claude/skills/majiayu000-claude-skill-registry-lean4-prove && rm -rf "$T"

manifest: skills/data/lean4-prove/SKILL.md

lean4-prove

Retrieval-augmented Lean4 proof generation for engineering requirements. Uses 94,000+ proven theorems from DeepSeek-Prover V1+V2 to guide proof synthesis via hybrid search (BM25 + semantic + graph traversal).

Architecture

Requirement + Tactics + Persona
        │
        ▼
┌───────────────────────────┐
│ 1. RECALL similar proofs  │  ← Hybrid search on ArangoDB
│    from 94k+ exemplars    │     (BM25 + semantic + graph)
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 2. BUILD support pack     │
│    - Validated imports    │
│    - Tactic patterns      │
│    - Similar proofs       │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 3. GENERATE N candidates  │  ← Claude with exemplar context
│    constrained by corpus  │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 4. COMPILE each in        │  ← lean_runner Docker
│    lean_runner container  │
└───────────────────────────┘
        │
   ┌────┴────┐
   │         │
 Success   Failure
   │         │
   ▼         ▼
 Return   Retry with error feedback
          (up to max_retries)

Why Retrieval-Augmented?

Determinism - Exact provenance: "used these 3 proofs as templates"
Version alignment - Exemplars use imports that actually work
Fewer hallucinations - Constrained to lemmas that exist
Tactic idioms - Transfers working
```
simp
```
sets and proof patterns

Usage

# Basic proof
./run.sh --requirement "Prove n + 0 = n"

# With tactics preference
./run.sh -r "Prove commutativity of addition" -t "simp,ring,omega"

# With persona context
./run.sh -r "Prove message integrity" -p "cryptographer"

# Via stdin (JSON)
echo '{"requirement": "Prove n + 0 = n", "tactics": ["rfl"]}' | ./run.sh

# Custom settings
./run.sh -r "Prove theorem" --candidates 5 --retries 5 --model claude-sonnet-4-20250514

Output

{
  "success": true,
  "code": "import Mathlib\n\ntheorem list_append_length (xs ys : List α) :\n    (xs ++ ys).length = xs.length + ys.length := by\n  induction xs with\n  | nil => simp\n  | cons x xs ih => simp [List.cons_append, ih]",
  "attempts": 1,
  "candidate": 0,
  "errors": null,
  "retrieval": {
    "retrieved": 5,
    "tactics_added": ["simp", "aesop", "norm_num", "intro"],
    "imports_count": 3
  }
}

On failure:

{
  "success": false,
  "code": null,
  "attempts": 9,
  "errors": [
    "Candidate 0 attempt 1: unknown identifier 'natAdd'",
    "Candidate 1 attempt 1: type mismatch..."
  ],
  "retrieval": {
    "retrieved": 5,
    "tactics_added": ["simp", "exact"],
    "imports_count": 3
  }
}

Parameters

Parameter	Default	Description
`--requirement, -r`	(required)	Theorem to prove
`--tactics, -t`	none	Comma-separated preferred tactics
`--persona, -p`	none	Persona context for generation
`--candidates, -n`	3	Parallel proof candidates
`--retries`	3	Max retries per candidate
`--model`	claude-sonnet-4-20250514	Claude model
`--container`	lean_runner	Docker container name
`--timeout`	120	Compilation timeout (seconds)

Environment Variables

# Proof generation
LEAN4_CONTAINER=lean_runner      # Docker container
LEAN4_TIMEOUT=120                # Compile timeout
LEAN4_MAX_RETRIES=3              # Retries per candidate
LEAN4_CANDIDATES=3               # Parallel candidates
LEAN4_PROVE_MODEL=opus           # Claude model (opus recommended for proofs)

# Retrieval (requires ArangoDB with ingested dataset)
LEAN4_RETRIEVAL=1                # Enable/disable retrieval (default: 1)
LEAN4_RETRIEVAL_K=5              # Number of exemplars to retrieve
ARANGO_URL=http://127.0.0.1:8529 # ArangoDB connection
ARANGO_DB=memory                 # Database name (same as memory skill)

Dataset Setup

The skill uses 94,000+ theorems from DeepSeek-Prover V1+V2 for retrieval:

V1: 27,503 theorems (
```
status: "proven"
```
)
V2: 66,708 theorems (
```
status: "ok"
```
for 11,689 proven + others)

One-time ingest:

# Ingest full dataset (~5 min)
./ingest.sh

# Or limit for testing
./ingest.sh --limit 1000

This populates the

lean_theorems

collection in ArangoDB with:

```
formal_statement
```
- The theorem statement
```
formal_proof
```
- Working proof code
```
header
```
- Validated imports (Mathlib, Aesop, etc.)
```
tactics
```
- Extracted tactic names
```
source
```
- "deepseek-prover-v1" or "deepseek-prover-v2"
```
status
```
- "proven" (V1) or "ok"/"failed"/etc. (V2)

Authentication

Uses Claude Code CLI (

claude -p

) in headless non-interactive mode.

The CLI is called with:

```
-p
```
flag for print/headless mode
```
--output-format text
```
for plain text output
```
--max-turns 1
```
for single-turn operation

Environment variables

CLAUDE_CODE

and

CLAUDECODE

are cleared to avoid recursion detection when called from within Claude Code.

No separate API key required - authentication is handled via your Claude subscription.

Requirements

Docker with
```
lean_runner
```
container running (Lean4 + Mathlib installed)
Claude Code CLI (
```
claude
```
) in PATH with valid authentication
ArangoDB running locally (for retrieval - optional but recommended)
Dataset ingested via
```
./ingest.sh
```
(one-time setup)

Tactics

Common Lean4/Mathlib tactics to suggest:

Tactic	Use For
`rfl`	Reflexivity proofs
`simp`	Simplification
`ring`	Ring arithmetic
`omega`	Linear arithmetic
`decide`	Decidable propositions
`exact`	Exact term construction
`apply`	Apply lemmas
`induction`	Inductive proofs

Examples

Engineering: List operations

./run.sh -r "Prove length(xs ++ ys) = length(xs) + length(ys)" -t "simp,induction"

Engineering: State machine property

./run.sh -r "When mux_enable is false, output shall equal default_value" \
  -p "embedded systems engineer" -t "simp,cases,decide"

Engineering: Protocol correctness

./run.sh -r "Prove message append preserves checksum: checksum(msg ++ data) = update(checksum(msg), data)" \
  -t "simp,induction,ring"

Cryptography

./run.sh -r "Prove that XOR is self-inverse: a ⊕ a = 0" -p "cryptographer" -t "simp,decide"

Complex theorem

./run.sh -r "Prove the sum of first n natural numbers equals n*(n+1)/2" \
  -t "induction,simp,ring" \
  --candidates 5 \
  --retries 5

Difference from lean4-verify

Skill	Purpose
`lean4-verify`	Compile-only. Takes Lean4 code, returns pass/fail
`lean4-prove`	Full pipeline. Takes requirement, generates + compiles + retries

Use

lean4-verify

when you already have Lean4 code to check. Use

lean4-prove

when you need to generate the proof from a requirement.

Advanced: Memory Integration

The skill integrates with the memory project for hybrid retrieval (BM25 + semantic + graph traversal):

# One-time setup: embed theorems and create edges
ARANGO_PASS=yourpass ARANGO_DB=memory python integrate_memory.py

This creates:

Embeddings: 39k+ theorem embeddings in
```
lesson_embeddings
```
for semantic search
Tactic edges: 10k+ edges between theorems sharing primary tactics
Similarity edges: Edges between semantically similar theorems (cosine > 0.7)

This enables:

Semantic search: Find theorems by meaning, not just keywords
Multi-hop traversal: "What proofs use similar tactics?"
Impact analysis: "If I change lemma X, what breaks?"

Lemma Dependency Graph (Planned)

Extract lemma dependencies during compilation to build an impact graph:

Query: "What's affected if mux_default changes?"

Answer:
  mux_default
    └── system_safe_state (NEEDS RE-VERIFY)
          └── requirement_42 (NEEDS RE-VERIFY)

Engineering Use Cases

Ambiguity detection: Formalization fails → requirement is vague
Completeness check: "Are all state transitions covered?"
Regression impact: "Change to X affects theorems Y, Z"
Cross-reference: "Which requirements share this lemma?"