Asi entropy-regularized-inference

Entropy-Regularized Inference (Third-Order Meta-Skill)

install

source · Clone the upstream repo

git clone https://github.com/plurigrid/asi

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/entropy-regularized-inference" ~/.claude/skills/plurigrid-asi-entropy-regularized-inference && rm -rf "$T"

manifest: skills/entropy-regularized-inference/SKILL.md

source content

Entropy-Regularized Inference (Third-Order Meta-Skill)

"The entropy regularizer is not ad-hoc—it's the principled mechanism by which agents acknowledge uncertainty about their own predictions." — Synthesis of Kenny, Friston, Kidger, Bolte

Trigger Conditions

User asks about connecting active inference to practical RL
Questions about why entropy bonus improves PPO training
Bridging continuous ODE dynamics with discrete state-space models
Understanding scale-free inference across hierarchical systems
Unifying predictive coding, active inference, and robot control

Overview

Third-order meta-skill emerging from the constructive collision of four expert threads, each discovered via 2-3-5-7 prime sieve refinement:

Prime	Expert	Thread	Key Insight
2	Patrick Kenny	Discrete Active Inference	PAD ≠ EFE by entropy regularizer
3	Karl Friston / Da Costa	Scale-Free Active Inference	RGM = discrete homologues of deep CNNs
5	Patrick Kidger	JPC/Diffrax	Inference as gradient flow ODE: ż = -∂ℱ/∂z
7	Ben Bolte	K-Scale Robotics	`entropy_coef=0.01` prevents policy collapse

The Four-Way Collision

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ENTROPY REGULARIZATION: THE UNIVERSAL BRIDGE              │
│                                                                              │
│  ┌──────────────────┐         ┌──────────────────┐                          │
│  │   Kenny (2025)   │         │  Friston (2024)  │                          │
│  │  Discrete ActInf │         │   Scale-Free AI  │                          │
│  │                  │         │                  │                          │
│  │  PAD = VFE +     │         │  RGM = discrete  │                          │
│  │    KL(future)    │         │  deep CNNs via   │                          │
│  │                  │         │  renormalization │                          │
│  │  ↓ differs from  │         │                  │                          │
│  │  EFE by entropy  │         │  ↓ scale         │                          │
│  │  regularizer     │         │  invariance      │                          │
│  └────────┬─────────┘         └────────┬─────────┘                          │
│           │                            │                                     │
│           └──────────┬─────────────────┘                                     │
│                      │                                                       │
│                      ▼                                                       │
│        ┌─────────────────────────────┐                                       │
│        │   COLLISION POINT:          │                                       │
│        │   Entropy bounds prediction │                                       │
│        │   confidence at ALL scales  │                                       │
│        └─────────────────────────────┘                                       │
│                      │                                                       │
│           ┌──────────┴──────────┐                                            │
│           │                     │                                            │
│           ▼                     ▼                                            │
│  ┌──────────────────┐  ┌──────────────────┐                                  │
│  │  Kidger (2024)   │  │   Bolte (2025)   │                                  │
│  │  JPC/Diffrax     │  │   K-Scale Labs   │                                  │
│  │                  │  │                  │                                  │
│  │  Inference as    │  │  entropy_coef=   │                                  │
│  │  gradient flow:  │  │  0.01 prevents   │                                  │
│  │                  │  │  policy collapse │                                  │
│  │  ż = -∂ℱ/∂z     │  │                  │                                  │
│  │                  │  │  "RL-based       │                                  │
│  │  Heun solver     │  │  closed-loop     │                                  │
│  │  beats Euler     │  │  control has     │                                  │
│  │                  │  │  firmly won"     │                                  │
│  └──────────────────┘  └──────────────────┘                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Unifying Principle

All four threads converge on the same mathematical structure:

OBJECTIVE = PREDICTION_ERROR + λ · ENTROPY

Where:
- PREDICTION_ERROR measures mismatch with observations
- ENTROPY prevents overconfident predictions
- λ is the regularization coefficient (entropy_coef in PPO)

Thread-Specific Instantiations

Thread	Prediction Error	Entropy Term	λ
Kenny PAD	VFE(past) + KL(future)	H[Q(s)]	Implicit in PAD
Friston RGM	Renormalized VFE	Scale-invariant H	Per-level λ_ℓ
Kidger JPC	Σ‖z_ℓ - f_ℓ(W_ℓz_{ℓ-1})‖²	Solver regularization	Step size
Bolte PPO	Policy loss + Value loss	-H[π(a\|s)]	entropy_coef

Why Entropy Regularization Works

Biological Rationale (Kenny)

"If I am confident in my predictions about future observations, and I am bad at predicting my future observations, then my perception/action divergence criterion is going to be very high."

The entropy regularizer forces agents to acknowledge uncertainty about predictions they can't reliably make. This prevents:

Premature convergence to suboptimal policies
Overconfident predictions about future states
Exploitation-only behavior that ignores exploration

Scale-Free Rationale (Friston/Da Costa)

Renormalizing Generative Models maintain entropy bounds at each hierarchical level:

Level L: High-level goals (compositional structure)
    ↓ entropy preserved across coarse-graining
Level L-1: Trajectory patterns (temporal composition)  
    ↓ entropy preserved across coarse-graining
Level L-2: Action primitives (motor commands)
    ↓ entropy preserved across coarse-graining
Level 0: Raw actuator signals

The RGM framework shows that scale invariance requires entropy preservation—the same regularization principle applies at every level of the hierarchy.

Continuous Dynamics Rationale (Kidger)

JPC's gradient flow formulation:

dz_ℓ/dt = -∂ℱ/∂z_ℓ

Where ℱ = Σ_ℓ ‖z_ℓ - f_ℓ(W_ℓ z_{ℓ-1})‖²

The ODE solver's step size acts as an implicit regularizer. Heun's method (2nd order Runge-Kutta) outperforms Euler because it better preserves the entropy of the dynamical flow—avoiding numerical artifacts that create spurious certainty.

Practical Rationale (Bolte/K-Scale)

# From ksim PPO implementation
loss = policy_loss + vf_coef * value_loss - entropy_coef * entropy

# entropy_coef = 0.01 is the standard value
# Too low (0.001): policy collapses to deterministic, fails on novel states
# Too high (0.1): policy stays too random, never converges
# 0.01: Goldilocks zone—explores enough, exploits enough

The

entropy_coef=0.01

heuristic discovered empirically by RL practitioners is the same regularization principle derived theoretically by active inference researchers.

Implementation: Unified Inference Engine

import jax.numpy as jnp
from diffrax import diffeqsolve, Heun, ODETerm
import equinox as eqx

class EntropyRegularizedInference(eqx.Module):
    """
    Third-order skill: Unified inference across all four threads.
    
    Combines:
    - Kenny's PAD formulation (discrete state spaces)
    - Friston's RGM (hierarchical scale-free)
    - Kidger's JPC (continuous ODE dynamics)
    - Bolte's PPO (practical robot control)
    """
    
    # Hierarchical predictive model (RGM-style)
    levels: list[eqx.nn.Linear]
    
    # Entropy coefficient (Bolte-style)
    entropy_coef: float = 0.01
    
    # ODE solver settings (Kidger-style)
    solver: str = "heun"  # 2nd order beats Euler
    
    def predictive_coding_loss(
        self, 
        observations: jnp.ndarray,
        activities: list[jnp.ndarray]
    ) -> tuple[float, dict]:
        """
        JPC-style prediction error across levels.
        """
        total_loss = 0.0
        level_losses = {}
        
        for ell, (z_ell, W_ell) in enumerate(zip(activities, self.levels)):
            if ell == 0:
                target = observations
            else:
                target = activities[ell - 1]
            
            prediction = W_ell(target)
            error = jnp.sum((z_ell - prediction) ** 2)
            level_losses[f"level_{ell}"] = error
            total_loss += error
        
        return total_loss, level_losses
    
    def entropy_regularizer(
        self, 
        policy_logits: jnp.ndarray
    ) -> float:
        """
        Kenny/Bolte-style entropy term.
        
        This is the KEY INSIGHT: entropy regularization is not ad-hoc,
        it's the principled way to avoid overconfident predictions.
        """
        probs = jax.nn.softmax(policy_logits)
        entropy = -jnp.sum(probs * jnp.log(probs + 1e-8))
        return entropy
    
    def perception_action_divergence(
        self,
        observations: jnp.ndarray,
        beliefs: jnp.ndarray,
        policy_logits: jnp.ndarray
    ) -> float:
        """
        Kenny's PAD criterion:
        PAD = VFE(past) + KL(future) 
            = prediction_error - entropy_bonus
        
        Note: PAD differs from EFE by the entropy regularizer.
        """
        # VFE component (prediction error)
        vfe, _ = self.predictive_coding_loss(observations, beliefs)
        
        # Entropy component (regularizer)
        entropy = self.entropy_regularizer(policy_logits)
        
        # PAD = VFE - entropy (lower is better)
        pad = vfe - self.entropy_coef * entropy
        
        return pad
    
    def inference_dynamics(
        self, 
        t: float, 
        activities: jnp.ndarray, 
        observations: jnp.ndarray
    ) -> jnp.ndarray:
        """
        Kidger-style gradient flow ODE:
        dz/dt = -∂ℱ/∂z
        
        Solved with Heun (2nd order) for better entropy preservation.
        """
        def free_energy(z):
            loss, _ = self.predictive_coding_loss(observations, z)
            return loss
        
        # Gradient of free energy w.r.t. activities
        grad_F = jax.grad(free_energy)(activities)
        
        return -grad_F  # Gradient descent dynamics
    
    def run_inference(
        self,
        observations: jnp.ndarray,
        initial_activities: jnp.ndarray,
        t_span: tuple[float, float] = (0.0, 1.0)
    ) -> jnp.ndarray:
        """
        Solve inference dynamics using Diffrax.
        """
        term = ODETerm(
            lambda t, y, args: self.inference_dynamics(t, y, observations)
        )
        
        solver = Heun()  # 2nd order Runge-Kutta
        
        solution = diffeqsolve(
            term,
            solver,
            t0=t_span[0],
            t1=t_span[1],
            dt0=0.1,
            y0=initial_activities
        )
        
        return solution.ys[-1]  # Final activities


# Unified training loop combining all four threads
def train_step(
    model: EntropyRegularizedInference,
    trajectory: dict,
    ppo_config: dict
) -> dict:
    """
    K-Scale style PPO with principled entropy regularization.
    """
    observations = trajectory["observations"]
    actions = trajectory["actions"]
    returns = trajectory["returns"]
    
    # Run inference (Kidger ODE dynamics)
    activities = model.run_inference(observations, initial_guess)
    
    # Compute PAD (Kenny criterion)
    pad = model.perception_action_divergence(
        observations, activities, policy_logits
    )
    
    # PPO loss (Bolte practical implementation)
    policy_loss = ppo_policy_loss(policy_logits, actions, advantages)
    value_loss = ppo_value_loss(value_preds, returns)
    
    # Entropy bonus (the universal regularizer!)
    entropy = model.entropy_regularizer(policy_logits)
    
    # Total loss: prediction + value - entropy
    total_loss = (
        policy_loss 
        + ppo_config["vf_coef"] * value_loss 
        - ppo_config["entropy_coef"] * entropy  # ← THE KEY
    )
    
    return {
        "total_loss": total_loss,
        "pad": pad,
        "entropy": entropy,
        "policy_loss": policy_loss,
        "value_loss": value_loss
    }

GF(3) Trit Assignment

Trit: 0 (ERGODIC)
Role: Coordination (third-order meta-synthesis)
Color: #E3136C
URI: skill://entropy-regularized-inference#E3136C

Balanced Quad

entropy-regularized-inference (0) ⊗ 
active-inference-robotics (+1) ⊗ 
jpc-predictive-coding (+1) ⊗ 
scale-free-rgm (+1) = 3 ≡ 0 (mod 3) ✓

This is a "generative triad" — all +1 generators balanced by
the ergodic (0) meta-skill that coordinates them.

Skill Colors (Gay.jl deterministic)

Skill	Trit	Color	Role
`entropy-regularized-inference`	0	`#E8A317`	Meta-coordinator
`active-inference-robotics`	+1	`#A1BE3C`	Generator (theory→practice)
`jpc-predictive-coding`	+1	`#7AF799`	Generator (continuous dynamics)
`scale-free-rgm`	+1	`#4E9CD9`	Generator (hierarchical structure)

Mutual Awareness Graph

synthesizes:
  - active-inference-robotics  # Kenny PAD
  - sim2real-predictive-coding  # Transfer as inference
  - kscale-ksim  # Practical PPO implementation
  
draws_from:
  - jpc-predictive-coding  # Kidger ODE formulation (hypothetical)
  - scale-free-rgm  # Friston hierarchical inference (hypothetical)
  
enables:
  - cognitive-superposition  # Team mental models with entropy bounds
  - parametrised-optics-cybernetics  # Categorical composition
  - hierarchical-control  # Multi-level reference signals

Key Equations Summary

Kenny: Perception/Action Divergence

PAD = D_KL[Q(H_{1:t}) || P(H_{1:t} | O_{1:t})]  
    + D_KL[Q(S_{t+1:T}) || P(S_{t+1:T} | H_{1:t})]

Note: Observable emissions cancel in future KL!

Friston: Scale-Free Free Energy

F_ℓ = E_Q[log Q(s_ℓ) - log P(o_ℓ, s_ℓ | s_{ℓ+1})]

Renormalization: F_total = Σ_ℓ F_ℓ with scale-invariant structure

Kidger: Predictive Coding Dynamics

dz_ℓ/dt = -∂ℱ/∂z_ℓ

ℱ = Σ_ℓ ‖z_ℓ - f_ℓ(W_ℓ z_{ℓ-1})‖²

Bolte: PPO with Entropy

L = L_policy + c_1 · L_value - c_2 · H[π]

Where c_2 = entropy_coef = 0.01 (empirically optimal)

References

Narya Compatibility (Structure-Aware Diffing)

Field	Definition
`before`	Inference state: (beliefs Q, policy π, entropy H[π])
`after`	Updated state after gradient step or belief revision
`delta`	Free energy change ΔF with entropy regularization term
`birth`	Maximum entropy prior (uniform beliefs, random policy)
`impact`	1 if entropy collapsed (H[π] < threshold), 0 otherwise

Third-Order Synthesis Event Structure

@dataclass  
class EntropyRegularizedNaryaEvent:
    """Structure-aware diff tracking entropy regularization across frameworks."""
    event_id: str
    before: InferenceState    # (Q, π, H[π], F)
    after: InferenceState     # Updated state
    delta: EntropyDelta       # Change with regularization decomposition
    trit: int                 # GF(3): -1=entropy_decrease, 0=stable, +1=entropy_increase
    framework: str            # "kenny_pad" | "friston_rgm" | "kidger_jpc" | "bolte_ppo"
    
    @property
    def impact(self) -> int:
        """1 if entropy collapsed below safe threshold."""
        return 1 if self.after.entropy < ENTROPY_FLOOR else 0

@dataclass
class EntropyDelta:
    free_energy_change: float   # ΔF (raw)
    entropy_change: float       # ΔH[π]
    regularized_change: float   # ΔF - entropy_coef * ΔH
    entropy_coef: float         # The universal constant (≈0.01)

Cross-Framework Unification

def unify_inference_events(
    kenny_events: list[ActiveInferenceNaryaEvent],
    ksim_events: list[KsimNaryaEvent],
    entropy_coef: float = 0.01
) -> list[EntropyRegularizedNaryaEvent]:
    """Map diverse frameworks to common entropy-regularized structure."""
    unified = []
    
    for kenny, ksim in zip(kenny_events, ksim_events):
        # Kenny's PAD already includes entropy via divergence
        kenny_entropy = -kenny.delta.kl_future  # Entropy ≈ -KL from uniform
        
        # Bolte's PPO explicitly tracks entropy bonus
        ksim_entropy = compute_policy_entropy(ksim.delta.action_distribution)
        
        unified.append(EntropyRegularizedNaryaEvent(
            event_id=f"unified_{kenny.event_id}",
            before=InferenceState(
                beliefs=kenny.before,
                policy=ksim.before.policy,
                entropy=kenny_entropy,
                free_energy=kenny.delta.vfe
            ),
            after=InferenceState(
                beliefs=kenny.after,
                policy=ksim.after.policy,
                entropy=ksim_entropy,
                free_energy=kenny.delta.vfe - kenny.delta.vfe  # Post-update
            ),
            delta=EntropyDelta(
                free_energy_change=kenny.delta.vfe,
                entropy_change=ksim_entropy - kenny_entropy,
                regularized_change=kenny.delta.vfe - entropy_coef * (ksim_entropy - kenny_entropy),
                entropy_coef=entropy_coef
            ),
            trit=sign(ksim_entropy - kenny_entropy),  # Entropy direction
            framework="unified"
        ))
    
    return unified

Entropy Conservation Verification

def verify_entropy_health(events: list[EntropyRegularizedNaryaEvent]) -> ProofBundle:
    """Verify entropy regularization prevents policy collapse."""
    return ProofBundle(
        verifiers={
            "entropy_floor": all(e.after.entropy > ENTROPY_FLOOR for e in events),
            "entropy_ceiling": all(e.after.entropy < ENTROPY_CEILING for e in events),
            "regularization_active": all(e.delta.entropy_coef > 0 for e in events),
            "gf3_conservation": sum(e.trit for e in events) % 3 == 0
        },
        overall="VERIFIED" if all_pass else "FAILED",
        proof_hash=sha256(json.dumps([e.to_dict() for e in events]))
    )

ACSet Schema

@present SchEntropyRegularizedInference(FreeSchema) begin
    # Objects (from all four threads)
    State::Ob           # Latent state (discrete or continuous)
    Observation::Ob     # Sensory input
    Activity::Ob        # Neural activity (JPC)
    Policy::Ob          # Action distribution
    Level::Ob           # Hierarchical level (RGM)
    
    # Morphisms
    predict::Hom(State, Observation)      # Generative model
    infer::Hom(Observation, State)        # Recognition model
    flow::Hom(Activity, Activity)         # ODE dynamics
    coarsen::Hom(Level, Level)            # Renormalization
    
    # Attributes
    Scalar::AttrType
    free_energy::Attr(State, Scalar)
    entropy::Attr(Policy, Scalar)         # THE KEY REGULARIZER
    pad::Attr(State × Policy, Scalar)     # Kenny's criterion
    
    # The universal law:
    # ∀ s: State, π: Policy.
    #   pad(s, π) = free_energy(s) - entropy_coef · entropy(π)
end