Asi gflownet

Bengio's GFlowNets: Generative Flow Networks that sample proportionally to reward. Diversity over maximization for causal discovery and molecule design.

install

source · Clone the upstream repo

git clone https://github.com/plurigrid/asi

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/ies/music-topos/.codex/skills/gflownet" ~/.claude/skills/plurigrid-asi-gflownet && rm -rf "$T"

manifest: ies/music-topos/.codex/skills/gflownet/SKILL.md

source content

GFlowNet Skill

"Sample x with probability proportional to R(x), not just maximize R(x)." — Yoshua Bengio

Overview

GFlowNets (Generative Flow Networks) are a new paradigm:

RL: Maximize expected reward → single optimal solution
MCMC: Sample from distribution → slow mixing
GFlowNet: Learn to sample P(x) ∝ R(x) → fast, diverse sampling

Core Concept

GFlowNet Objective:

∀ terminal state x:  P_θ(x) = R(x) / Z

Where:
  P_θ(x) = probability of generating x via forward policy
  R(x) = unnormalized reward function
  Z = partition function (normalizing constant)

Key Insight: We DON'T need to know Z to train!

Architecture

┌─────────────────────────────────────────────────────┐
│                    GFlowNet                         │
├─────────────────────────────────────────────────────┤
│  Initial State s₀                                   │
│        │                                            │
│        ▼                                            │
│  ┌─────────────┐                                    │
│  │ Forward     │  P_F(s' | s) = learned policy      │
│  │ Policy      │                                    │
│  └──────┬──────┘                                    │
│         │ sample action                             │
│         ▼                                            │
│  ┌─────────────┐                                    │
│  │ Transition  │  s → s'                            │
│  └──────┬──────┘                                    │
│         │                                            │
│         ▼                                            │
│  ┌─────────────┐                                    │
│  │ Terminal?   │───No──▶ continue                   │
│  └──────┬──────┘                                    │
│         │ Yes                                        │
│         ▼                                            │
│  ┌─────────────┐                                    │
│  │ R(x)        │  Evaluate reward                   │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘

Training Objectives

1. Trajectory Balance (TB)

def trajectory_balance_loss(trajectory: List[State], reward: float) -> Tensor:
    """
    TB: Z × Π P_F(s_t → s_{t+1}) = R(x) × Π P_B(s_{t+1} → s_t)
    
    In log space:
    log Z + Σ log P_F = log R + Σ log P_B
    """
    log_Z = self.log_Z  # Learnable parameter
    
    log_P_F = sum(self.forward_policy.log_prob(s, s_next) 
                  for s, s_next in zip(trajectory[:-1], trajectory[1:]))
    
    log_P_B = sum(self.backward_policy.log_prob(s_next, s)
                  for s, s_next in zip(trajectory[:-1], trajectory[1:]))
    
    loss = (log_Z + log_P_F - torch.log(reward) - log_P_B) ** 2
    return loss

2. Detailed Balance (DB)

def detailed_balance_loss(s: State, s_next: State, reward_s: float) -> Tensor:
    """
    DB: F(s) × P_F(s → s') = F(s') × P_B(s' → s)
    
    Where F(s) = learned flow function.
    """
    log_F_s = self.flow_network(s)
    log_F_s_next = self.flow_network(s_next)
    
    log_P_F = self.forward_policy.log_prob(s, s_next)
    log_P_B = self.backward_policy.log_prob(s_next, s)
    
    loss = (log_F_s + log_P_F - log_F_s_next - log_P_B) ** 2
    return loss

Applications

1. Molecule Design

# GFlowNet for drug discovery
class MoleculeGFlowNet:
    def __init__(self):
        self.action_space = ['add_atom', 'add_bond', 'terminate']
        
    def sample_molecule(self) -> SMILES:
        state = EmptyMolecule()
        while not state.is_terminal():
            action = self.forward_policy.sample(state)
            state = state.apply(action)
        return state.to_smiles()
    
    def reward(self, molecule: SMILES) -> float:
        # Combines: drug-likeness, binding affinity, synthesizability
        return docking_score(molecule) * qed(molecule)

2. Causal Discovery

# GFlowNet for DAG sampling
class CausalDAGGFlowNet:
    def __init__(self, n_variables: int):
        self.n = n_variables
        
    def sample_dag(self) -> DAG:
        """Sample DAG with P(G) ∝ P(data | G)."""
        dag = EmptyDAG(self.n)
        while not dag.is_complete():
            edge = self.forward_policy.sample(dag)
            if not dag.would_create_cycle(edge):
                dag.add_edge(edge)
        return dag

3. Combinatorial Optimization

# GFlowNet for set generation
class SetGFlowNet:
    def sample_set(self, universe: Set) -> Set:
        """Sample set S with P(S) ∝ R(S)."""
        current_set = set()
        for element in self.ordering(universe):
            include = self.forward_policy.sample(current_set, element)
            if include:
                current_set.add(element)
        return current_set

GF(3) Triads

# Causal-Categorical Triad
sheaf-cohomology (-1) ⊗ cognitive-superposition (0) ⊗ gflownet (+1) = 0 ✓

# Diversity Triad
persistent-homology (-1) ⊗ glass-bead-game (0) ⊗ gflownet (+1) = 0 ✓

# Sampling Triad
three-match (-1) ⊗ epistemic-arbitrage (0) ⊗ gflownet (+1) = 0 ✓

Integration with Interaction Entropy

module GFlowNet
  def self.sample_proportional(candidates, reward_fn, seed)
    gen = SplitMixTernary::Generator.new(seed)
    
    # Build forward trajectory
    trajectory = []
    state = initial_state
    
    until terminal?(state)
      # Use color to guide sampling
      color = gen.next_color
      action = select_action(state, color)
      next_state = transition(state, action)
      trajectory << { state: state, action: action, color: color }
      state = next_state
    end
    
    reward = reward_fn.call(state)
    
    {
      terminal_state: state,
      reward: reward,
      trajectory: trajectory,
      trit: 1  # Generator (creates diverse samples)
    }
  end
end

Key Properties

Amortized: Learn once, sample many times (unlike MCMC per-problem)
Off-policy: Can train on any trajectories
Diverse: Samples cover modes proportionally to reward
Compositional: Build complex objects step-by-step

References

Bengio, E. et al. (2021). "Flow Network Based Generative Models for Non-Iterative Diverse Candidate Generation."
Malkin, N. et al. (2022). "Trajectory Balance: Improved Credit Assignment in GFlowNets."
Deleu, T. et al. (2022). "Bayesian Structure Learning with Generative Flow Networks."
torchgfn library