install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/mlx-bitnet-gf3" ~/.claude/skills/plurigrid-asi-mlx-bitnet-gf3 && rm -rf "$T"
manifest:
skills/mlx-bitnet-gf3/SKILL.mdsource content
mlx-bitnet-gf3 Skill
1.58-bit LLMs on Apple Silicon with GF(3) color integration
Trit: +1 (PLUS - generative)
Color:
#38E6AFSource: exo-explore/mlx-bitnet + Gay.jl
Overview
BitNet 1.58-bit uses ternary weights {-1, 0, +1} — identical to GF(3) trits:
BitNet Weight GF(3) Trit Role ──────────────────────────────────── -1 MINUS Inhibitory / Validator 0 ERGODIC Neutral / Coordinator +1 PLUS Excitatory / Generator
This skill bridges:
- MLX BitNet: 1.58-bit inference on Apple Silicon
- Gay.jl: Deterministic color generation with GF(3) conservation
- QAT: Quantization-Aware Training for ternary networks
Installation
# Clone mlx-bitnet git clone https://github.com/exo-explore/mlx-bitnet cd mlx-bitnet pip install -r requirements.txt # Convert weights python convert.py # Verify python test_interop.py
Models
| Model | Source | Size |
|---|---|---|
| HuggingFace | ~3B |
| HuggingFace | 1B |
GF(3) Weight Visualization
Color each weight by its trit value:
import mlx.core as mx from gay_mcp import color_at, seed def visualize_layer_gf3(weights: mx.array, gay_seed: int = 1069): """Color ternary weights using Gay.jl deterministic colors.""" seed(gay_seed) trit_colors = { -1: color_at(gay_seed, 1), # MINUS color 0: color_at(gay_seed, 2), # ERGODIC color +1: color_at(gay_seed, 3), # PLUS color } # Count trit distribution counts = { -1: int((weights == -1).sum()), 0: int((weights == 0).sum()), +1: int((weights == 1).sum()), } # Verify GF(3) conservation (sum should be ≈ 0 for balanced layers) trit_sum = counts[-1] * (-1) + counts[0] * 0 + counts[+1] * 1 gf3_residue = trit_sum % 3 return { "colors": trit_colors, "counts": counts, "trit_sum": trit_sum, "gf3_residue": gf3_residue, "conserved": gf3_residue == 0 }
Quantization-Aware Training (QAT)
Forward Pass with Ternary Quantization
import mlx.core as mx import mlx.nn as nn def quantize_ternary(w: mx.array) -> tuple[mx.array, mx.array]: """Quantize weights to {-1, 0, +1} with learned scale.""" scale = mx.abs(w).mean() w_normalized = w / (scale + 1e-8) w_ternary = mx.clip(mx.round(w_normalized), -1, 1) return w_ternary, scale class BitLinear(nn.Module): """1.58-bit linear layer with QAT.""" def __init__(self, in_features: int, out_features: int): super().__init__() # Latent weights in full precision (for training) self.w_latent = mx.random.normal((out_features, in_features)) * 0.02 self.scale = mx.ones((1,)) def __call__(self, x: mx.array) -> mx.array: # Quantize to ternary during forward w_ternary, scale = quantize_ternary(self.w_latent) # Matrix multiply with ternary weights # Note: ternary matmul can be optimized to additions only! return x @ (w_ternary.T * scale) def gf3_stats(self) -> dict: """Get GF(3) trit distribution.""" w_ternary, _ = quantize_ternary(self.w_latent) return { "minus": int((w_ternary == -1).sum()), "ergodic": int((w_ternary == 0).sum()), "plus": int((w_ternary == 1).sum()), }
Straight-Through Estimator (STE)
def ste_ternary(w_latent: mx.array) -> mx.array: """STE: forward uses quantized, backward uses latent.""" w_ternary, scale = quantize_ternary(w_latent) # Stop gradient on quantization, pass through on latent return mx.stop_gradient(w_ternary - w_latent) + w_latent
Color-Coded Layer Analysis
def analyze_model_gf3(model, seed: int = 1069): """Analyze entire BitNet model through GF(3) lens.""" from gay_mcp import palette colors = palette(3, seed=seed) layer_stats = [] for name, layer in model.named_modules(): if hasattr(layer, 'gf3_stats'): stats = layer.gf3_stats() total = sum(stats.values()) layer_stats.append({ "name": name, "minus_pct": stats["minus"] / total * 100, "ergodic_pct": stats["ergodic"] / total * 100, "plus_pct": stats["plus"] / total * 100, "gf3_sum": -stats["minus"] + stats["plus"], "conserved": (-stats["minus"] + stats["plus"]) % 3 == 0 }) return { "colors": { "minus": colors[0]["hex"], "ergodic": colors[1]["hex"], "plus": colors[2]["hex"], }, "layers": layer_stats }
Inference with Color Tracing
def generate_with_color_trace(model, tokenizer, prompt: str, max_tokens: int = 100): """Generate text while tracing GF(3) activations.""" from gay_mcp import next_color tokens = tokenizer.encode(prompt) trace = [] for i in range(max_tokens): # Forward pass logits = model(mx.array([tokens])) # Sample next token next_token = int(mx.argmax(logits[:, -1, :], axis=-1)) tokens.append(next_token) # Color this generation step step_color = next_color() trace.append({ "step": i, "token": tokenizer.decode([next_token]), "color": step_color["hex"], "trit": step_color["trit"] }) if next_token == tokenizer.eos_token_id: break # Verify GF(3) conservation across trace trit_sum = sum(t["trit"] for t in trace) return { "output": tokenizer.decode(tokens), "trace": trace, "trit_sum": trit_sum, "gf3_conserved": trit_sum % 3 == 0 }
World → World' Morphism Integration
BitNet weight updates during training are World → World' morphisms:
def training_step_as_morphism(model, batch, lr: float = 1e-4): """Each training step is a World → World' transition.""" from gay_mcp import gay_seed, color_at # World state before update world_before = { name: layer.w_latent.copy() for name, layer in model.named_modules() if hasattr(layer, 'w_latent') } # Forward + backward + update loss = compute_loss(model, batch) loss.backward() for name, layer in model.named_modules(): if hasattr(layer, 'w_latent'): layer.w_latent -= lr * layer.w_latent.grad # World state after update world_after = { name: layer.w_latent.copy() for name, layer in model.named_modules() if hasattr(layer, 'w_latent') } # Compute WEV (World Extractable Value) = improvement wev = float(loss) # Lower loss = more value extracted # Color the morphism by seed morphism_color = color_at(1069, hash(str(batch)) % 1000) return { "world_before": world_before, "world_after": world_after, "loss": float(loss), "wev": wev, "morphism_color": morphism_color }
GF(3) Balanced Triads
mlx-bitnet-gf3 (+1) ⊗ worlding (-1) ⊗ geb (0) = 0 ✓ mlx-bitnet-gf3 (+1) ⊗ unworlding-involution (+1) ⊗ ? (-2) = 0
Skill Neighborhood
| Skill | Trit | Relationship |
|---|---|---|
| mlx-apple-silicon | 0 | MLX runtime |
| mlx-jax-splitmix | 0 | Deterministic RNG |
| discrete-backprop | -1 | Gradient-free ternary |
| forward-forward-learning | +1 | Local learning |
| gay-mcp | +1 | Color generation |
Commands
# Run inference with GF(3) trace just bitnet-infer "Hello world" --trace-gf3 # Analyze layer trit distribution just bitnet-analyze model.safetensors # Train with QAT just bitnet-train --dataset data.jsonl --qat # Visualize weights as colors just bitnet-viz layer_0.weights.png
References
- BitNet: Scaling 1-bit Transformers
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- exo-explore/mlx-bitnet
- 1bitLLM on HuggingFace
- Gay.jl GF(3) Documentation
Verified Outputs (Thread T-019ba0f4)
BitNet 1.58-bit Running on Apple Silicon
Model: mlx-community/bitnet-b1.58-2B-4T-4bit Architecture: LlamaModel, 30 layers Speed: 96.1 tokens/sec Memory: 0.76 GB Weights: ternary {-1, 0, +1} = GF(3) trits
GF(3) Color Trace (seed 1069)
============================================================ BitNet 1.58-bit + GF(3) Color Trace (seed 1069) ============================================================ [-] #6404C3 A [-] #EB6AF7 world [-] #7F1747 morphism, [+] #4A4744 often [o] #3AC4D6 denoted [-] #6D2BEE as [o] #6BCECC φ, [-] #3B194C is [+] #9DA895 a [o] #1C45D5 concept [o] #4ED072 used [o] #1F3EA5 in GF(3) trit sum: -3 mod 3 = 0 ✓ CONSERVED
Model Q&A Responses
Q: In one sentence: GF(3) is A: GF(3) is a finite field with three elements, often denoted as GF(3^2) or F_3^2, which is used in various areas of mathematics and computer science. Q: Ternary neural network weights mean A: In the context of a ternary neural network, the weights refer to the parameters that are adjusted during the training process... Q: World morphism φ: W → W' is A: A world morphism, often denoted as φ, is a concept used in modal logic and model theory to describe how a structure (or a world) can be transformed in...
Quick Run Commands
# One-liner: Run BitNet with GF(3) trace uvx --with mlx --with mlx-lm --with huggingface_hub python << 'EOF' from mlx_lm import load, generate model, tokenizer = load('mlx-community/bitnet-b1.58-2B-4T-4bit') prompt = "GF(3) means" messages = [{"role": "user", "content": prompt}] formatted = tokenizer.apply_chat_template(messages, add_generation_prompt=True) response = generate(model, tokenizer, prompt=formatted, max_tokens=50, verbose=True) EOF
Complete Integration Summary
| Component | Value |
|---|---|
| Model | |
| Architecture | LlamaModel, 30 layers |
| Speed | 96 tokens/sec |
| Memory | 0.76 GB |
| Bits/weight | 1.58 = log₂(3) |
| Weight values | {-1, 0, +1} = GF(3) trits |
| Seed | 1069 (deterministic) |
| GF(3) Conservation | ✓ Verified (sum mod 3 = 0) |
World Morphism Bridge
φ : World → World' (geb + world-hopping + anoma-intents) φ⁻¹: World' → World (unworlding-involution + worlding) BitNet training step = GF(3)-conserving World morphism Each weight update: trit_before → trit_after with Σ ≡ 0 (mod 3)
Loaded Skills (GF(3) Balanced)
| Skill | Trit | Color | Purpose |
|---|---|---|---|
| world-hopping | 0 | | Kripke/Badiou accessibility |
| worlding | -1 | | persistent state |
| world-extractable-value | 0 | | WEV = PoA - 1 |
| geb | +1 | | Categorical morphism semantics |
| unworlding-involution | +1 | | ι∘ι = id (inverse) |
| mlx-bitnet-gf3 | +1 | | 1.58-bit = GF(3) trit |
Trit: +1 (PLUS - generative)
Key Insight: 1.58-bit weights ARE GF(3) trits — log₂(3) ≈ 1.58
Thread: T-019ba0f4-31a2-77bd-b442-79a0944f3caa