git clone https://github.com/plurigrid/asi
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/entropy-regularized-inference" ~/.claude/skills/plurigrid-asi-entropy-regularized-inference && rm -rf "$T"
skills/entropy-regularized-inference/SKILL.mdEntropy-Regularized Inference (Third-Order Meta-Skill)
"The entropy regularizer is not ad-hoc—it's the principled mechanism by which agents acknowledge uncertainty about their own predictions." — Synthesis of Kenny, Friston, Kidger, Bolte
Trigger Conditions
- User asks about connecting active inference to practical RL
- Questions about why entropy bonus improves PPO training
- Bridging continuous ODE dynamics with discrete state-space models
- Understanding scale-free inference across hierarchical systems
- Unifying predictive coding, active inference, and robot control
Overview
Third-order meta-skill emerging from the constructive collision of four expert threads, each discovered via 2-3-5-7 prime sieve refinement:
| Prime | Expert | Thread | Key Insight |
|---|---|---|---|
| 2 | Patrick Kenny | Discrete Active Inference | PAD ≠ EFE by entropy regularizer |
| 3 | Karl Friston / Da Costa | Scale-Free Active Inference | RGM = discrete homologues of deep CNNs |
| 5 | Patrick Kidger | JPC/Diffrax | Inference as gradient flow ODE: ż = -∂ℱ/∂z |
| 7 | Ben Bolte | K-Scale Robotics | prevents policy collapse |
The Four-Way Collision
┌─────────────────────────────────────────────────────────────────────────────┐ │ ENTROPY REGULARIZATION: THE UNIVERSAL BRIDGE │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Kenny (2025) │ │ Friston (2024) │ │ │ │ Discrete ActInf │ │ Scale-Free AI │ │ │ │ │ │ │ │ │ │ PAD = VFE + │ │ RGM = discrete │ │ │ │ KL(future) │ │ deep CNNs via │ │ │ │ │ │ renormalization │ │ │ │ ↓ differs from │ │ │ │ │ │ EFE by entropy │ │ ↓ scale │ │ │ │ regularizer │ │ invariance │ │ │ └────────┬─────────┘ └────────┬─────────┘ │ │ │ │ │ │ └──────────┬─────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────┐ │ │ │ COLLISION POINT: │ │ │ │ Entropy bounds prediction │ │ │ │ confidence at ALL scales │ │ │ └─────────────────────────────┘ │ │ │ │ │ ┌──────────┴──────────┐ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Kidger (2024) │ │ Bolte (2025) │ │ │ │ JPC/Diffrax │ │ K-Scale Labs │ │ │ │ │ │ │ │ │ │ Inference as │ │ entropy_coef= │ │ │ │ gradient flow: │ │ 0.01 prevents │ │ │ │ │ │ policy collapse │ │ │ │ ż = -∂ℱ/∂z │ │ │ │ │ │ │ │ "RL-based │ │ │ │ Heun solver │ │ closed-loop │ │ │ │ beats Euler │ │ control has │ │ │ │ │ │ firmly won" │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘
The Unifying Principle
All four threads converge on the same mathematical structure:
OBJECTIVE = PREDICTION_ERROR + λ · ENTROPY Where: - PREDICTION_ERROR measures mismatch with observations - ENTROPY prevents overconfident predictions - λ is the regularization coefficient (entropy_coef in PPO)
Thread-Specific Instantiations
| Thread | Prediction Error | Entropy Term | λ |
|---|---|---|---|
| Kenny PAD | VFE(past) + KL(future) | H[Q(s)] | Implicit in PAD |
| Friston RGM | Renormalized VFE | Scale-invariant H | Per-level λ_ℓ |
| Kidger JPC | Σ‖z_ℓ - f_ℓ(W_ℓz_{ℓ-1})‖² | Solver regularization | Step size |
| Bolte PPO | Policy loss + Value loss | -H[π(a|s)] | entropy_coef |
Why Entropy Regularization Works
Biological Rationale (Kenny)
"If I am confident in my predictions about future observations, and I am bad at predicting my future observations, then my perception/action divergence criterion is going to be very high."
The entropy regularizer forces agents to acknowledge uncertainty about predictions they can't reliably make. This prevents:
- Premature convergence to suboptimal policies
- Overconfident predictions about future states
- Exploitation-only behavior that ignores exploration
Scale-Free Rationale (Friston/Da Costa)
Renormalizing Generative Models maintain entropy bounds at each hierarchical level:
Level L: High-level goals (compositional structure) ↓ entropy preserved across coarse-graining Level L-1: Trajectory patterns (temporal composition) ↓ entropy preserved across coarse-graining Level L-2: Action primitives (motor commands) ↓ entropy preserved across coarse-graining Level 0: Raw actuator signals
The RGM framework shows that scale invariance requires entropy preservation—the same regularization principle applies at every level of the hierarchy.
Continuous Dynamics Rationale (Kidger)
JPC's gradient flow formulation:
dz_ℓ/dt = -∂ℱ/∂z_ℓ Where ℱ = Σ_ℓ ‖z_ℓ - f_ℓ(W_ℓ z_{ℓ-1})‖²
The ODE solver's step size acts as an implicit regularizer. Heun's method (2nd order Runge-Kutta) outperforms Euler because it better preserves the entropy of the dynamical flow—avoiding numerical artifacts that create spurious certainty.
Practical Rationale (Bolte/K-Scale)
# From ksim PPO implementation loss = policy_loss + vf_coef * value_loss - entropy_coef * entropy # entropy_coef = 0.01 is the standard value # Too low (0.001): policy collapses to deterministic, fails on novel states # Too high (0.1): policy stays too random, never converges # 0.01: Goldilocks zone—explores enough, exploits enough
The
entropy_coef=0.01 heuristic discovered empirically by RL practitioners is the same regularization principle derived theoretically by active inference researchers.
Implementation: Unified Inference Engine
import jax.numpy as jnp from diffrax import diffeqsolve, Heun, ODETerm import equinox as eqx class EntropyRegularizedInference(eqx.Module): """ Third-order skill: Unified inference across all four threads. Combines: - Kenny's PAD formulation (discrete state spaces) - Friston's RGM (hierarchical scale-free) - Kidger's JPC (continuous ODE dynamics) - Bolte's PPO (practical robot control) """ # Hierarchical predictive model (RGM-style) levels: list[eqx.nn.Linear] # Entropy coefficient (Bolte-style) entropy_coef: float = 0.01 # ODE solver settings (Kidger-style) solver: str = "heun" # 2nd order beats Euler def predictive_coding_loss( self, observations: jnp.ndarray, activities: list[jnp.ndarray] ) -> tuple[float, dict]: """ JPC-style prediction error across levels. """ total_loss = 0.0 level_losses = {} for ell, (z_ell, W_ell) in enumerate(zip(activities, self.levels)): if ell == 0: target = observations else: target = activities[ell - 1] prediction = W_ell(target) error = jnp.sum((z_ell - prediction) ** 2) level_losses[f"level_{ell}"] = error total_loss += error return total_loss, level_losses def entropy_regularizer( self, policy_logits: jnp.ndarray ) -> float: """ Kenny/Bolte-style entropy term. This is the KEY INSIGHT: entropy regularization is not ad-hoc, it's the principled way to avoid overconfident predictions. """ probs = jax.nn.softmax(policy_logits) entropy = -jnp.sum(probs * jnp.log(probs + 1e-8)) return entropy def perception_action_divergence( self, observations: jnp.ndarray, beliefs: jnp.ndarray, policy_logits: jnp.ndarray ) -> float: """ Kenny's PAD criterion: PAD = VFE(past) + KL(future) = prediction_error - entropy_bonus Note: PAD differs from EFE by the entropy regularizer. """ # VFE component (prediction error) vfe, _ = self.predictive_coding_loss(observations, beliefs) # Entropy component (regularizer) entropy = self.entropy_regularizer(policy_logits) # PAD = VFE - entropy (lower is better) pad = vfe - self.entropy_coef * entropy return pad def inference_dynamics( self, t: float, activities: jnp.ndarray, observations: jnp.ndarray ) -> jnp.ndarray: """ Kidger-style gradient flow ODE: dz/dt = -∂ℱ/∂z Solved with Heun (2nd order) for better entropy preservation. """ def free_energy(z): loss, _ = self.predictive_coding_loss(observations, z) return loss # Gradient of free energy w.r.t. activities grad_F = jax.grad(free_energy)(activities) return -grad_F # Gradient descent dynamics def run_inference( self, observations: jnp.ndarray, initial_activities: jnp.ndarray, t_span: tuple[float, float] = (0.0, 1.0) ) -> jnp.ndarray: """ Solve inference dynamics using Diffrax. """ term = ODETerm( lambda t, y, args: self.inference_dynamics(t, y, observations) ) solver = Heun() # 2nd order Runge-Kutta solution = diffeqsolve( term, solver, t0=t_span[0], t1=t_span[1], dt0=0.1, y0=initial_activities ) return solution.ys[-1] # Final activities # Unified training loop combining all four threads def train_step( model: EntropyRegularizedInference, trajectory: dict, ppo_config: dict ) -> dict: """ K-Scale style PPO with principled entropy regularization. """ observations = trajectory["observations"] actions = trajectory["actions"] returns = trajectory["returns"] # Run inference (Kidger ODE dynamics) activities = model.run_inference(observations, initial_guess) # Compute PAD (Kenny criterion) pad = model.perception_action_divergence( observations, activities, policy_logits ) # PPO loss (Bolte practical implementation) policy_loss = ppo_policy_loss(policy_logits, actions, advantages) value_loss = ppo_value_loss(value_preds, returns) # Entropy bonus (the universal regularizer!) entropy = model.entropy_regularizer(policy_logits) # Total loss: prediction + value - entropy total_loss = ( policy_loss + ppo_config["vf_coef"] * value_loss - ppo_config["entropy_coef"] * entropy # ← THE KEY ) return { "total_loss": total_loss, "pad": pad, "entropy": entropy, "policy_loss": policy_loss, "value_loss": value_loss }
GF(3) Trit Assignment
Trit: 0 (ERGODIC) Role: Coordination (third-order meta-synthesis) Color: #E3136C URI: skill://entropy-regularized-inference#E3136C
Balanced Quad
entropy-regularized-inference (0) ⊗ active-inference-robotics (+1) ⊗ jpc-predictive-coding (+1) ⊗ scale-free-rgm (+1) = 3 ≡ 0 (mod 3) ✓ This is a "generative triad" — all +1 generators balanced by the ergodic (0) meta-skill that coordinates them.
Skill Colors (Gay.jl deterministic)
| Skill | Trit | Color | Role |
|---|---|---|---|
| 0 | | Meta-coordinator |
| +1 | | Generator (theory→practice) |
| +1 | | Generator (continuous dynamics) |
| +1 | | Generator (hierarchical structure) |
Mutual Awareness Graph
synthesizes: - active-inference-robotics # Kenny PAD - sim2real-predictive-coding # Transfer as inference - kscale-ksim # Practical PPO implementation draws_from: - jpc-predictive-coding # Kidger ODE formulation (hypothetical) - scale-free-rgm # Friston hierarchical inference (hypothetical) enables: - cognitive-superposition # Team mental models with entropy bounds - parametrised-optics-cybernetics # Categorical composition - hierarchical-control # Multi-level reference signals
Key Equations Summary
Kenny: Perception/Action Divergence
PAD = D_KL[Q(H_{1:t}) || P(H_{1:t} | O_{1:t})] + D_KL[Q(S_{t+1:T}) || P(S_{t+1:T} | H_{1:t})] Note: Observable emissions cancel in future KL!
Friston: Scale-Free Free Energy
F_ℓ = E_Q[log Q(s_ℓ) - log P(o_ℓ, s_ℓ | s_{ℓ+1})] Renormalization: F_total = Σ_ℓ F_ℓ with scale-invariant structure
Kidger: Predictive Coding Dynamics
dz_ℓ/dt = -∂ℱ/∂z_ℓ ℱ = Σ_ℓ ‖z_ℓ - f_ℓ(W_ℓ z_{ℓ-1})‖²
Bolte: PPO with Entropy
L = L_policy + c_1 · L_value - c_2 · H[π] Where c_2 = entropy_coef = 0.01 (empirically optimal)
References
- Kenny (2025) Active Inference from First Principles
- Friston et al. (2024) From Pixels to Planning: Scale-Free Active Inference
- JPC: Flexible Inference for Predictive Coding Networks
- Patrick Kidger - Diffrax Documentation
- K-Scale Labs - ksim
- Ben Bolte - RL Papers Collection
- MuJoCo Playground Technical Report
Narya Compatibility (Structure-Aware Diffing)
| Field | Definition |
|---|---|
| Inference state: (beliefs Q, policy π, entropy H[π]) |
| Updated state after gradient step or belief revision |
| Free energy change ΔF with entropy regularization term |
| Maximum entropy prior (uniform beliefs, random policy) |
| 1 if entropy collapsed (H[π] < threshold), 0 otherwise |
Third-Order Synthesis Event Structure
@dataclass class EntropyRegularizedNaryaEvent: """Structure-aware diff tracking entropy regularization across frameworks.""" event_id: str before: InferenceState # (Q, π, H[π], F) after: InferenceState # Updated state delta: EntropyDelta # Change with regularization decomposition trit: int # GF(3): -1=entropy_decrease, 0=stable, +1=entropy_increase framework: str # "kenny_pad" | "friston_rgm" | "kidger_jpc" | "bolte_ppo" @property def impact(self) -> int: """1 if entropy collapsed below safe threshold.""" return 1 if self.after.entropy < ENTROPY_FLOOR else 0 @dataclass class EntropyDelta: free_energy_change: float # ΔF (raw) entropy_change: float # ΔH[π] regularized_change: float # ΔF - entropy_coef * ΔH entropy_coef: float # The universal constant (≈0.01)
Cross-Framework Unification
def unify_inference_events( kenny_events: list[ActiveInferenceNaryaEvent], ksim_events: list[KsimNaryaEvent], entropy_coef: float = 0.01 ) -> list[EntropyRegularizedNaryaEvent]: """Map diverse frameworks to common entropy-regularized structure.""" unified = [] for kenny, ksim in zip(kenny_events, ksim_events): # Kenny's PAD already includes entropy via divergence kenny_entropy = -kenny.delta.kl_future # Entropy ≈ -KL from uniform # Bolte's PPO explicitly tracks entropy bonus ksim_entropy = compute_policy_entropy(ksim.delta.action_distribution) unified.append(EntropyRegularizedNaryaEvent( event_id=f"unified_{kenny.event_id}", before=InferenceState( beliefs=kenny.before, policy=ksim.before.policy, entropy=kenny_entropy, free_energy=kenny.delta.vfe ), after=InferenceState( beliefs=kenny.after, policy=ksim.after.policy, entropy=ksim_entropy, free_energy=kenny.delta.vfe - kenny.delta.vfe # Post-update ), delta=EntropyDelta( free_energy_change=kenny.delta.vfe, entropy_change=ksim_entropy - kenny_entropy, regularized_change=kenny.delta.vfe - entropy_coef * (ksim_entropy - kenny_entropy), entropy_coef=entropy_coef ), trit=sign(ksim_entropy - kenny_entropy), # Entropy direction framework="unified" )) return unified
Entropy Conservation Verification
def verify_entropy_health(events: list[EntropyRegularizedNaryaEvent]) -> ProofBundle: """Verify entropy regularization prevents policy collapse.""" return ProofBundle( verifiers={ "entropy_floor": all(e.after.entropy > ENTROPY_FLOOR for e in events), "entropy_ceiling": all(e.after.entropy < ENTROPY_CEILING for e in events), "regularization_active": all(e.delta.entropy_coef > 0 for e in events), "gf3_conservation": sum(e.trit for e in events) % 3 == 0 }, overall="VERIFIED" if all_pass else "FAILED", proof_hash=sha256(json.dumps([e.to_dict() for e in events])) )
ACSet Schema
@present SchEntropyRegularizedInference(FreeSchema) begin # Objects (from all four threads) State::Ob # Latent state (discrete or continuous) Observation::Ob # Sensory input Activity::Ob # Neural activity (JPC) Policy::Ob # Action distribution Level::Ob # Hierarchical level (RGM) # Morphisms predict::Hom(State, Observation) # Generative model infer::Hom(Observation, State) # Recognition model flow::Hom(Activity, Activity) # ODE dynamics coarsen::Hom(Level, Level) # Renormalization # Attributes Scalar::AttrType free_energy::Attr(State, Scalar) entropy::Attr(Policy, Scalar) # THE KEY REGULARIZER pad::Attr(State × Policy, Scalar) # Kenny's criterion # The universal law: # ∀ s: State, π: Policy. # pad(s, π) = free_energy(s) - entropy_coef · entropy(π) end