Claude-skill-registry distribution-search
Guidance for finding probability distributions that satisfy specific statistical constraints such as KL divergence targets, entropy requirements, or moment conditions. This skill should be used when tasks involve constructing discrete or continuous probability distributions with specified divergence measures, entropy values, or other distributional properties through numerical optimization.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/distribution-search" ~/.claude/skills/majiayu000-claude-skill-registry-distribution-search && rm -rf "$T"
skills/data/distribution-search/SKILL.mdDistribution Search
Overview
This skill provides systematic approaches for finding probability distributions that meet specific statistical constraints. Common tasks include constructing distributions with target KL divergence values (forward or backward), specified entropy, moment constraints, or combinations thereof. The approach emphasizes mathematical analysis before implementation, efficient parameterization, modular code structure, and rigorous verification.
When to Use This Skill
- Finding distributions with specific KL divergence values (forward or backward)
- Constructing distributions with target entropy
- Searching for distributions satisfying moment constraints
- Optimization problems involving probability mass/density functions
- Any task requiring numerical search over distribution parameters
Methodology
Phase 1: Mathematical Analysis Before Coding
Before writing any code, thoroughly analyze the mathematical constraints:
1. Constraint Feasibility
- Determine if a solution exists given the constraints
- Calculate bounds on achievable values (e.g., max entropy for given support)
- Identify necessary conditions for solution existence
2. Degrees of Freedom Analysis
- Count the number of free parameters needed
- Determine if simple parameterizations (e.g., two-group distributions) have sufficient flexibility
- Plan for more complex parameterizations if needed
3. Analytical Derivations
- Derive any closed-form relationships that constrain the search
- For KL divergence: H(P) = log(V) - D_KL(P||Q) when Q is uniform over vocabulary V
- Use analytical results to narrow the search space
Phase 2: Efficient Parameterization
Start Simple, Plan for Complexity
-
Two-group distributions: Divide elements into high-probability and low-probability groups
- Parameters: k (number of high-prob elements), p_high, p_low
- Constraint: k * p_high + (V - k) * p_low = 1
-
Multi-group distributions: If two groups are insufficient, add more groups
- More degrees of freedom allow satisfying more constraints
-
Continuous parameterizations: For smooth optimization landscapes
- Softmax over logits
- Exponential family parameterizations
Computational Efficiency for Large Vocabularies
For large vocabulary sizes (e.g., V = 150,000):
- Avoid creating full arrays when closed-form calculations exist
- Use analytical formulas for group-based distributions:
Forward KL = k * p_high * log(p_high * V) + (V - k) * p_low * log(p_low * V) - Only create full arrays for final verification
Phase 3: Optimization Strategy
Choose Appropriate Methods
- Direct analytical solution: When constraints reduce to solvable equations
- Root-finding (fsolve): When you have equations equal to zero
- Least squares (least_squares): When minimizing squared constraint violations
- Gradient-free optimization (Nelder-Mead): When derivatives are unavailable or noisy
- Grid search over discrete parameters: For parameters like k (number of elements in a group)
Implementation Pattern
def objective(params, target_forward_kl, target_backward_kl, vocab_size): # Extract parameters k, log_ratio = params k = int(round(k)) # Compute probabilities p_high, p_low = compute_probs(k, log_ratio, vocab_size) # Validate probabilities if p_high <= 0 or p_low <= 0 or p_high > 1 or p_low > 1: return [1e10, 1e10] # Infeasible # Compute KL divergences using closed-form formulas forward_kl = compute_forward_kl(k, p_high, p_low, vocab_size) backward_kl = compute_backward_kl(k, p_high, p_low, vocab_size) return [forward_kl - target_forward_kl, backward_kl - target_backward_kl]
Grid Search for Discrete Parameters
best_solution = None best_error = float('inf') for k in range(1, vocab_size): # Optimize continuous parameters for this k result = optimize_continuous_params(k, targets, vocab_size) if result.error < best_error: best_error = result.error best_solution = result
Phase 4: Code Organization
Modular Structure to Prevent Inconsistencies
Create separate, reusable functions for core computations:
# Core computation functions - define ONCE, use everywhere def forward_kl(p, q, mask=None): """Compute D_KL(P || Q) = sum_i p_i * log(p_i / q_i)""" if mask is None: mask = p > 1e-30 return np.sum(p[mask] * np.log(p[mask] / q[mask])) def backward_kl(p, q, mask=None): """Compute D_KL(Q || P) = sum_i q_i * log(q_i / p_i)""" if mask is None: mask = p > 1e-30 return np.sum(q[mask] * np.log(q[mask] / p[mask])) def entropy(p, mask=None): """Compute H(P) = -sum_i p_i * log(p_i)""" if mask is None: mask = p > 1e-30 return -np.sum(p[mask] * np.log(p[mask]))
Import in All Scripts
# In optimization script from kl_utils import forward_kl, backward_kl # In verification script - use SAME functions from kl_utils import forward_kl, backward_kl
Phase 5: Verification
Verification Checklist
For the final solution, verify:
Distribution Properties: [ ] All probabilities are positive [ ] All probabilities are <= 1 [ ] Sum of probabilities equals 1.0 (within floating-point tolerance) [ ] No NaN or Inf values Constraint Satisfaction: [ ] Forward KL divergence within tolerance [ ] Backward KL divergence within tolerance [ ] Other constraints (entropy, moments) within tolerance Numerical Precision: [ ] Tolerance requirements are met (e.g., |error| < 1e-6) [ ] Floating-point sum is acceptably close to 1.0
Verification Script Structure
def verify_distribution(p, q, target_forward, target_backward, tol=1e-6): print(f"Sum of probabilities: {np.sum(p)}") print(f"Min probability: {np.min(p)}") print(f"Max probability: {np.max(p)}") print(f"Any NaN: {np.any(np.isnan(p))}") print(f"Any Inf: {np.any(np.isinf(p))}") fwd = forward_kl(p, q) bwd = backward_kl(p, q) print(f"\nForward KL: {fwd:.10f} (target: {target_forward}, error: {abs(fwd - target_forward):.2e})") print(f"Backward KL: {bwd:.10f} (target: {target_backward}, error: {abs(bwd - target_backward):.2e})") fwd_ok = abs(fwd - target_forward) < tol bwd_ok = abs(bwd - target_backward) < tol print(f"\nForward KL within tolerance: {'PASS' if fwd_ok else 'FAIL'}") print(f"Backward KL within tolerance: {'PASS' if bwd_ok else 'FAIL'}") return fwd_ok and bwd_ok
Common Pitfalls
Pitfall 1: Full Array Creation for Large Vocabularies
Problem: Creating arrays of size V = 150,000 elements causes memory issues and timeouts Solution: Use closed-form formulas for group-based distributions; only create full arrays for final verification
Pitfall 2: Inconsistent Formula Implementations
Problem: Different scripts implement KL divergence formulas differently, leading to discrepancies Solution: Define core computation functions once and import them everywhere
Pitfall 3: Incorrect Masking in KL Divergence
Problem: Masking logic differs between forward and backward KL, or mask sum is incorrectly used Solution: Use consistent masking (p > 1e-30) and sum over masked elements, not multiply by mask count
Pitfall 4: Insufficient Degrees of Freedom
Problem: Simple parameterizations cannot satisfy all constraints simultaneously Solution: Analyze degrees of freedom before implementation; plan for more flexible parameterizations
Pitfall 5: Syntax Errors from Truncated Writes
Problem: File writes are truncated, leaving incomplete code Solution: Verify file content after every write by reading it back or attempting to import/execute
Pitfall 6: No Feasibility Analysis
Problem: Attempting optimization without verifying a solution exists Solution: Mathematically analyze constraints to establish feasibility before coding
Pitfall 7: Convergence to Local Minima
Problem: Optimization finds a local minimum that doesn't satisfy constraints Solution: Try multiple initializations; use grid search over discrete parameters; verify final solution
Pitfall 8: Floating-Point Precision Issues
Problem: Probability sum not exactly 1.0 due to floating-point arithmetic Solution: Use appropriate tolerances; normalize probabilities after construction; verify precision is acceptable for the task
KL Divergence Reference
Definitions
Forward KL (information projection):
D_KL(P || Q) = sum_i P(i) * log(P(i) / Q(i))
Backward KL (moment projection):
D_KL(Q || P) = sum_i Q(i) * log(Q(i) / P(i))
Properties
- KL divergence is non-negative: D_KL >= 0
- KL divergence is asymmetric: D_KL(P || Q) != D_KL(Q || P) in general
- When Q is uniform over V elements: D_KL(P || Q) = log(V) - H(P)
- KL divergence can be infinite if P has support where Q is zero
Closed-Form for Two-Group Distributions
For P with k elements at probability p_high and (V-k) elements at probability p_low, with Q uniform:
D_KL(P || Q) = k * p_high * log(p_high * V) + (V - k) * p_low * log(p_low * V) D_KL(Q || P) = (1/V) * [k * log(1 / (V * p_high)) + (V - k) * log(1 / (V * p_low))] = (1/V) * [-k * log(V * p_high) - (V - k) * log(V * p_low)]
Iterative Refinement Pattern
When initial approaches fail:
- Diagnose the failure: Understand why constraints aren't satisfied
- Check mathematical feasibility: Re-verify that a solution exists
- Increase flexibility: Add more parameters or groups
- Adjust optimization method: Try different solvers or initialization strategies
- Verify incrementally: Test each component in isolation before integration
Avoid completely rewriting from scratch each time; instead, modularly modify specific components.