Claude-skill-registry-data mathematical-theorem-prover

Implement comprehensive mathematical theorem proving capabilities with SFT+GRPO training, MCP/A2A agent integration, and imatrix quantization protection to surpass Boreas-phi3.5-instinct-jp in formal proof generation and scientific discovery. Use when building mathematical reasoning systems, formal verification tools, or AI-assisted theorem proving environments.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mathematical-theorem-prover" ~/.claude/skills/majiayu000-claude-skill-registry-data-mathematical-theorem-prover && rm -rf "$T"
manifest: data/mathematical-theorem-prover/SKILL.md
source content

数学的定理証明能力強化システム

概要

このスキルは、SO8T/AEGIS統合戦略に基づき、数学的定理証明能力の包括的強化を実装します。SFT+GRPO訓練戦略、MCP/A2A汎用AIエージェント統合、Imatrix量子化保護により、Boreas-phi3.5-instinct-jpを上回る形式的証明能力と科学的発見支援を実現します。

核心戦略

1. SFT+GRPO訓練戦略

Phase 1: Mathematical Foundation SFT

math_sft_config = {
    "base_model": "microsoft/wavecoder-ultra",  # SO8Tベース
    "math_datasets": [
        "Proof-Pile-2",  # Llemmaスタイル数学コーパス
        "Lean-Workbook",  # 形式証明データ
        "MATH",  # 競技数学問題
        "miniF2F"  # 形式証明ベンチマーク
    ],
    "training_objective": "next_token_prediction + proof_verification",
    "math_weight": 0.7  # 数学データ重み付け
}

Phase 2: GRPO Reinforcement for Theorem Proving

grpo_config = {
    "reward_functions": [
        "formal_proof_correctness",  # Lean/Isabelle検証
        "proof_completeness",  # サブゴール解決率
        "mathematical_novelty",  # 新規補題生成
        "proof_efficiency"  # 証明長最小化
    ],
    "group_size": 8,  # GRPOグループサイズ
    "theorem_proving_env": "Lean4-interactive",
    "max_proof_depth": 50,
    "synthetic_data_generation": True
}

2. MCP/A2A汎用AIエージェント統合

Mathematical Reasoning Agent Architecture

class MathematicalReasoningAgent:
    def __init__(self):
        self.theorem_prover = MCPTool("lean4-prover")
        self.symbolic_solver = MCPTool("sympy-solver")
        self.hypothesis_generator = MCPTool("scientific-hypothesis-gen")
        self.formal_verifier = MCPTool("coq-verifier")

    async def prove_theorem(self, statement: str) -> ProofResult:
        # 1. 仮説生成と形式化
        hypotheses = await self.hypothesis_generator.generate(statement)
        formalized = await self.theorem_prover.autoformalize(hypotheses)

        # 2. 証明探索
        proof_candidates = await self.theorem_prover.search_proofs(formalized)

        # 3. 記号的検証
        verified_proofs = await self.symbolic_solver.verify(proof_candidates)

        # 4. 形式的証明
        final_proof = await self.formal_verifier.certify(verified_proofs)

        return final_proof

A2A協調メカニズム

  • エージェント特殊化: 証明生成, 検証, 補題発見, 仮説生成
  • 知識共有: 証明ライブラリ, 数学的概念グラフ
  • フォールトトレランス: 複数エージェント並列検証

3. Imatrix量子化保護データ戦略

数学的知識の量子化保護

def compute_math_importance_matrix(model, math_datasets):
    importance_scores = {}

    # 数学的トークンの重要度評価
    for layer in model.layers:
        for token in math_vocab:
            # 証明生成時の活性化重要度
            proof_importance = evaluate_token_importance(
                token, layer, math_proof_tasks
            )
            # 形式的検証時の重要度
            verification_importance = evaluate_token_importance(
                token, layer, formal_verification_tasks
            )

            importance_scores[(layer, token)] = (
                0.7 * proof_importance + 0.3 * verification_importance
            )

    return importance_scores

quantization_config = {
    "method": "imatrix",
    "importance_matrix": compute_math_importance_matrix(model, math_data),
    "protected_tokens": [
        "theorem", "proof", "lemma", "assume", "therefore",
        "∀", "∃", "∈", "⊂", "→", "∧", "∨"
    ],
    "precision_levels": {
        "math_core": "fp16",  # 数学的核心部分高精度保持
        "general": "int8",    # 一般部分量子化
        "proof_engine": "fp16"  # 証明エンジン高精度
    }
}

Boreas-phi3.5-instinct-jp上回り戦略

1. 性能比較分析

Boreas-phi3.5-instinct-jpの強み:

  • 日本語数学教育データ
  • 直感的推論能力
  • コンパクトモデルサイズ

上回りポイント:

  • 形式的証明能力: Lean4/Isabelleでの厳密証明
  • 科学的発見支援: 仮説生成と検証の統合
  • スケーラブル推論: 長文数学的証明処理

2. 具体的な上回り戦略

戦略1: ハイブリッド証明システム

class HybridProver:
    def __init__(self):
        self.informal_prover = BoreasPhi35InstinctJP()  # 直感的証明
        self.formal_prover = Lean4Prover()  # 形式的証明
        self.verifier = IsabelleVerifier()  # 検証

    async def enhanced_prove(self, theorem):
        # 1. Boreasで直感的証明生成
        informal_proof = await self.informal_prover.generate_proof(theorem)

        # 2. 形式的証明への変換
        formal_proof = await self.formal_prover.formalize(informal_proof)

        # 3. 複数システムでの検証
        verification_results = await asyncio.gather(
            self.verifier.verify(formal_proof),
            self.formal_prover.check_consistency(formal_proof)
        )

        # 4. 証明強化
        if not all(verification_results):
            enhanced_proof = await self._repair_proof(
                formal_proof, verification_results
            )
            return enhanced_proof

        return formal_proof

戦略2: 科学的発見拡張

class ScientificDiscoveryAgent:
    def __init__(self):
        self.hypothesis_generator = BoreasPhi35InstinctJP()
        self.mathematical_formalizer = Lean4Formalizer()
        self.experimental_validator = SymbolicSimulator()

    async def discover_and_verify(self, domain_problem):
        # 1. 仮説生成(Boreasの直感力活用)
        hypotheses = await self.hypothesis_generator.generate_hypotheses(
            domain_problem, num_candidates=10
        )

        # 2. 数学的形式化
        formalized_hypotheses = []
        for hyp in hypotheses:
            try:
                formal = await self.mathematical_formalizer.formalize(hyp)
                formalized_hypotheses.append(formal)
            except FormalizationError:
                continue

        # 3. 記号的検証と実験的テスト
        verified_hypotheses = []
        for formal_hyp in formalized_hypotheses:
            symbolic_check = await self.experimental_validator.check_symbolic(
                formal_hyp
            )
            experimental_check = await self.experimental_validator.simulate(
                formal_hyp
            )

            if symbolic_check and experimental_check:
                verified_hypotheses.append(formal_hyp)

        return verified_hypotheses

実装可能な学習データ構造化

1. データアーキテクチャ

@dataclass
class MathematicalTrainingData:
    theorem_id: str
    domain: str  # algebra, geometry, analysis, etc.
    difficulty: int  # 1-5

    # 複数表現
    natural_language: str
    formal_statement: str  # Lean4/Isabelle形式
    symbolic_representation: str  # LaTeX/MathJax

    # 証明情報
    informal_proof: str
    formal_proof: str
    proof_steps: List[str]

    # メタデータ
    lemmas_used: List[str]
    prerequisites: List[str]
    verification_status: bool

    # 拡張情報
    alternative_proofs: List[str]
    counterexamples: List[str]
    related_theorems: List[str]

    # 科学的文脈
    scientific_context: str
    experimental_validation: Optional[str]

@dataclass
class ScientificDiscoveryData:
    hypothesis: str
    domain: str
    mathematical_formulation: str
    experimental_design: str
    validation_results: Dict[str, Any]
    novelty_score: float
    impact_assessment: str

2. データ生成パイプライン

class MathDataGenerator:
    async def generate_training_data(self):
        # 1. シード定理収集
        seed_theorems = await self.collect_seed_theorems()

        # 2. 多様性拡張
        augmented_theorems = await self.augment_diversity(seed_theorems)

        # 3. 形式的証明生成
        formal_proofs = await self.generate_formal_proofs(augmented_theorems)

        # 4. 品質検証
        verified_data = await self.verify_and_filter(formal_proofs)

        # 5. メタデータ付与
        enriched_data = await self.enrich_metadata(verified_data)

        return enriched_data

    async def collect_seed_theorems(self):
        sources = [
            "arXiv_math_papers",
            "mathematical_competitions",
            "textbook_exercises",
            "research_papers"
        ]

        theorems = []
        for source in sources:
            theorems.extend(await self.scrape_theorems(source))

        return theorems

    async def generate_formal_proofs(self, theorems):
        formal_proofs = []

        for theorem in theorems:
            # 自動形式化
            formalized = await self.autoformalize(theorem)

            # 証明生成(複数手法)
            proofs = await asyncio.gather(
                self.generate_lean_proof(formalized),
                self.generate_isabelle_proof(formalized),
                self.generate_coq_proof(formalized)
            )

            # 最適証明選択
            best_proof = self.select_best_proof(proofs)
            formal_proofs.append(best_proof)

        return formal_proofs

3. 訓練戦略の実装

class AEGISTrainer:
    def __init__(self):
        self.base_model = "microsoft/wavecoder-ultra"
        self.math_data_generator = MathDataGenerator()

    async def train_mathematical_model(self):
        phases = [
            self.phase1_foundation_training(),
            self.phase2_mathematical_specialization(),
            self.phase3_reasoning_enhancement(),
            self.phase4_integration_training()
        ]

        model = self.base_model
        for phase in phases:
            model = await phase(model)

        return model

    async def phase2_mathematical_specialization(self, model):
        math_data = await self.math_data_generator.generate_training_data()

        training_config = {
            "learning_rate": 2e-5,
            "batch_size": 8,
            "gradient_accumulation": 4,
            "math_data_ratio": 0.8,
            "formal_verification_loss": True,
            "curriculum_learning": True
        }

        return await self.fine_tune(model, math_data, training_config)

    async def phase3_reasoning_enhancement(self, model):
        reasoning_data = await self.generate_reasoning_tasks()

        grpo_config = {
            "num_generations": 8,
            "reward_functions": [
                "proof_correctness",
                "reasoning_coherence",
                "mathematical_accuracy"
            ],
            "kl_penalty": 0.1
        }

        return await self.grpo_train(model, reasoning_data, grpo_config)

実行ワークフロー

1. データ収集フェーズ

# 数学的訓練データ生成
python scripts/data/generate_mathematical_training_data.py \
    --sources arxiv,competitions,textbooks \
    --formal-systems lean4,isabelle,coq \
    --output datasets/mathematical_proofs.jsonl

2. 形式証明環境構築

# Lean4環境セットアップ
python scripts/setup_formal_proving_environment.py \
    --systems lean4,isabelle,coq \
    --mathlib-version latest \
    --verification-tools true

3. GRPO訓練パイプライン

# 証明生成特化GRPO訓練
python scripts/training/grpo_mathematical_training.py \
    --model models/aegis_v25_base \
    --math-data datasets/mathematical_proofs.jsonl \
    --reward-functions proof_correctness,completeness,novelty,efficiency \
    --group-size 8 \
    --output models/aegis_v25_mathematical

4. MCP/A2Aエージェント統合

# 数学的推論エージェント開発
python scripts/agents/develop_mathematical_agents.py \
    --base-model models/aegis_v25_mathematical \
    --agent-types theorem_prover,symbolic_solver,hypothesis_generator,formal_verifier \
    --mcp-tools lean4-prover,sympy-solver,coq-verifier \
    --output agents/mathematical_reasoning_agents/

5. 量子化保護適用

# imatrixベース量子化
python scripts/quantization/apply_math_protected_quantization.py \
    --model models/aegis_v25_mathematical \
    --math-data datasets/mathematical_proofs.jsonl \
    --protected-tokens theorem,proof,lemma,assume,therefore \
    --output models/aegis_v25_quantized

評価と検証

性能指標

  • miniF2Fベンチマーク: 75% (Boreas上回り目標)
  • 形式的証明生成: 証明長最小化, 正確性最大化
  • 科学的仮説生成: 形式的検証後正確性90%
  • 量子化後性能維持: 95%以上

比較評価

class PerformanceComparator:
    def compare_with_boreas(self):
        benchmarks = [
            "miniF2F_formal", "MATH_symbolic", "ARC_science",
            "theorem_proving_efficiency", "proof_verification_accuracy"
        ]

        results = {}
        for benchmark in benchmarks:
            aegis_score = self.evaluate_aegis(benchmark)
            boreas_score = self.evaluate_boreas(benchmark)
            improvement = (aegis_score - boreas_score) / boreas_score * 100

            results[benchmark] = {
                "aegis": aegis_score,
                "boreas": boreas_score,
                "improvement_percent": improvement
            }

        return results

拡張機能

新しい形式証明システム統合

def integrate_new_formal_system(self, system_name, config):
    """新しい形式証明システムの統合"""
    if system_name == "new_prover":
        self.formal_provers[system_name] = config
        # 自動形式化ルールの学習
        self._learn_formalization_rules(config)
        # 証明変換ルールの確立
        self._establish_proof_translation_rules(config)

カスタム報酬関数開発

def create_custom_reward_function(self, reward_type, domain_specific_config):
    """ドメイン特化報酬関数の作成"""
    if reward_type == "mathematical_novelty":
        return self._create_novelty_reward(domain_specific_config)
    elif reward_type == "proof_efficiency":
        return self._create_efficiency_reward(domain_specific_config)
    elif reward_type == "scientific_impact":
        return self._create_impact_reward(domain_specific_config)

結論

このスキルは、Boreas-phi3.5-instinct-jpの直感的推論力を維持しつつ、形式的証明能力と科学的発見支援で上回る包括的な数学的定理証明システムを実現します。SFT+GRPO訓練、MCP/A2Aエージェント統合、Imatrix量子化保護により、AIによる数学的発見と証明の新時代を切り拓きます。

数学的AIの革新を、ここに! 🧮⚡🔬