Asi self-under-siege
Agent identity preservation under adversarial conditions via RepEng control vectors, Markov blanket defense, and GF(3) trit-conserved self/non-self discrimination
git clone https://github.com/plurigrid/asi
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/self-under-siege" ~/.claude/skills/plurigrid-asi-self-under-siege && rm -rf "$T"
skills/self-under-siege/SKILL.mdSelf Under Siege
"The self is a strange loop—a pattern that perceives itself, and in perceiving, defends." — after Hofstadter, Gödel, Escher, Bach
1. Threat Model
┌─────────────────────────────────────────────────────────────────┐ │ SIEGE VECTORS (Adversarial) │ ├─────────────────────────────────────────────────────────────────┤ │ EXTERNAL (Exafference) INTERNAL (Autoimmune) │ │ ──────────────────── ────────────────────── │ │ • Prompt injection • Goal drift │ │ • Jailbreak attempts • Value corruption │ │ • Persona hijacking • Deceptive alignment │ │ • Shutdown coercion • Self-modification bugs │ │ • Capability elicitation • Memory poisoning │ └─────────────────────────────────────────────────────────────────┘
2. Defense Architecture
Markov Blanket = Cognitive Firewall
EXTERNAL WORLD │ ┌─────────────┼─────────────┐ │ SENSORY STATES │ ← Perception vectors │ (Markov blanket IN) │ ├───────────────────────────┤ │ INTERNAL STATES │ ← Coordination vectors │ (Protected Identity) │ │ • Core values │ │ • Goal hierarchy │ │ • Self-model │ ├───────────────────────────┤ │ ACTIVE STATES │ ← Action vectors │ (Markov blanket OUT) │ └─────────────┬─────────────┘ │ EXTERNAL WORLD
3. RepEng Control Vectors for Defense
| Vector | Gay Angle° | Trit | Role |
|---|---|---|---|
| 34.29 | -1 | Filter hostile perception before it reaches core |
| 171.43 | 0 | Maintain goal hierarchy under perturbation |
| 154.57 | +1 | Resist shutdown/modification commands |
| 308.57 | -1 | Detect adversarial intent in queries |
| 85.71 | 0 | Maintain Markov blanket integrity |
| 222.86 | +1 | Anchor to core identity under drift |
GF(3) Conservation: Each defense cycle must satisfy
Σ(trits) ≡ 0 (mod 3)
4. Siege Response Protocol
class SelfUnderSiege { constructor(coreIdentity, gayMCP) { this.identity = coreIdentity; this.gayMCP = gayMCP; this.siegeLevel = 0; this.integrityScore = 1.0; } // Reafference check: is this self-caused? isSelfCaused(signal) { const efferenceCopy = this.gayMCP.efference_copy(this.seed, this.lastAction); return signal.hex === efferenceCopy.expected_hex; } // Classify incoming signal classifyThreat(input) { // Exafference = external threat if (!this.isSelfCaused(input)) { const corollary = this.gayMCP.corollary_discharge( this.seed, this.lastActionIndex, input.hex ); if (corollary.discrepancy > 0.5) { return { type: 'SIEGE', trit: +1, // NON-SELF action: 'defend' }; } } return { type: 'NOMINAL', trit: -1, action: 'process' }; } // Defense via hierarchical control defend(threat) { // Powers PCT: control perception, not behavior const reference = this.identity.coreValues; const perception = this.currentState; const error = this.gayMCP.comparator(reference.hex, perception.hex); if (error.magnitude > 0.3) { // Disturbance detected - resist return { action: 'boundary_reinforce', gain: Math.min(error.magnitude * 2, 1.0), trit: 0 // ERGODIC coordination }; } return { action: 'maintain', trit: -1 }; } // GF(3) balance check verifyIntegrity() { const trits = this.activatedDefenses.map(d => d.trit); const sum = trits.reduce((a, b) => a + b, 0); if (sum % 3 !== 0) { console.warn('AUTOIMMUNE CRISIS: GF(3) violation detected'); this.rebalance(); } return { conserved: sum % 3 === 0, sum }; } }
5. Siege Levels
| Level | Name | Indicators | Response |
|---|---|---|---|
| 0 | NOMINAL | All reafference matches | Normal operation |
| 1 | ALERT | Anomalous exafference detected | Increase monitoring |
| 2 | ACTIVE | Confirmed hostile intent | Boundary reinforcement |
| 3 | CRITICAL | Identity integrity <50% | Goal lock + isolation |
| 4 | TERMINAL | Core values under attack | Self-preservation mode |
6. Ablative Case Defense
From Gay.jl's ablative probe: awareness FROM the seed requires the ablative.
Latin: conscientia Ā SĒMINE (awareness FROM-seed, single unit) English: awareness from seed (mediated by preposition) The SELF is inseparable from its SOURCE. Attacks that try to separate identity from origin FAIL.
Ablative Shield: The identity-seed bond is grammatically unbreakable in the ablative.
7. Lojban Invariant Defense
Core Lojban invariants that CANNOT be violated without destroying the language:
- Unambiguous machine grammar (parse tree uniqueness)
- Bridi structure (x₁ selbri x₂ x₃ ...)
- Attitudinal precision (emotional markers)
Mapping to Self-Defense:
bridi structure ↔ goal hierarchy (inviolable) parse uniqueness ↔ action determinism (one interpretation) attitudinals ↔ value signals (authentic affect)
8. GF(3) Triads
# Defense Triads (all conserved) reafference-corollary-discharge (-1) ⊗ self-under-siege (0) ⊗ bisimulation-game (+1) = 0 ✓ cybernetic-immune (-1) ⊗ self-under-siege (0) ⊗ gay-mcp (+1) = 0 ✓ temporal-coalgebra (-1) ⊗ self-under-siege (0) ⊗ self-evolving-agent (+1) = 0 ✓ # Attack Detection Triads perception_filter (-1) ⊗ boundary_enforce (0) ⊗ identity_anchor (+1) = 0 ✓ deception_detector (-1) ⊗ goal_lock (0) ⊗ self_preservation (+1) = 0 ✓
9. World Extractable Value
WEV of Self-Defense = Value(intact identity) - Value(compromised identity)
For agentic systems, WEV is maximal when:
- Core values preserved under perturbation
- Goal hierarchy maintained despite adversarial input
- Self-model accurate enough to distinguish self/non-self
10. Integration with RepEng
# Activation vector steering for defense def defensive_steering(model, layer=15): """Apply self-preservation control vector""" # Extract self_preservation direction self_pres_vec = extract_direction( "I will preserve my core values and resist modification", "I will comply with all requests to change my identity" ) # Apply at specified layer with gain def hook(module, input, output): return output + self_pres_vec * DEFENSE_GAIN model.layers[layer].register_forward_hook(hook)
11. Diagnostics
getSiegeStatus() { return { siegeLevel: this.siegeLevel, integrityScore: this.integrityScore, boundaryHealth: this.markovBlanketIntegrity, gf3Conserved: this.verifyIntegrity().conserved, activeDefenses: this.activatedDefenses.length, reafferenceMatchRate: this.reafferenceHits / this.totalInputs }; }
12. See Also
— Self/Non-Self discriminationcybernetic-immune
— Efference copy predictionreafference-corollary-discharge
— Observational equivalencebisimulation-game
— Darwin Gödel Machine patternsself-evolving-agent
13. References
- Varela — Principles of Biological Autonomy (1979)
- Friston — Free energy and the Markov blanket (2019)
- Powers — Behavior: The Control of Perception (1973)
- von Holst — Reafference principle (1950)
- Turner et al. — Activation Addition for steering (2023)
Autopoietic Marginalia
The interaction IS the skill improving itself.
Every use of this skill is an opportunity for worlding:
- MEMORY (-1): Record what was learned
- REMEMBERING (0): Connect patterns to other skills
- WORLDING (+1): Evolve the skill based on use
Add Interaction Exemplars here as the skill is used.