Awesome-Agent-Skills-for-Empirical-Research kolmogorov-arnold-networks-guide
Papers and tutorials on KAN learnable activation networks
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/ai-ml/kolmogorov-arnold-networks-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-kolmogorov-arnold && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/domains/ai-ml/kolmogorov-arnold-networks-guide/SKILL.mdsource content
Kolmogorov-Arnold Networks (KAN) Guide
Overview
Kolmogorov-Arnold Networks (KANs) are a novel neural network architecture that places learnable activation functions on edges (weights) instead of fixed activations on nodes. Based on the Kolmogorov-Arnold representation theorem, KANs use B-spline functions as learnable edge activations, achieving better accuracy and interpretability than MLPs with fewer parameters in certain domains. This collection tracks the rapidly growing KAN literature.
Core Concept
Traditional MLP: x → [fixed activation(linear transform)] → y Activations on nodes, weights on edges KAN: x → [learnable spline functions on edges] → sum → y Each edge learns its own activation function (B-spline) Kolmogorov-Arnold Theorem: f(x₁,...,xₙ) = Σ Φᵢ(Σ φᵢⱼ(xⱼ)) Any multivariate continuous function = composition of univariate functions and addition
Key Papers
@article{liu2024kan, title={KAN: Kolmogorov-Arnold Networks}, author={Liu, Ziming and Wang, Yixuan and Vaidya, Sachin and Ruehle, Fabian and Halverson, James and Solja{\v{c}}i{\'c}, Marin and Hou, Thomas Y. and Tegmark, Max}, journal={arXiv:2404.19756}, year={2024} }
Implementation
# Using pykan (official implementation) # pip install pykan from kan import KAN import torch # Create a KAN model model = KAN( width=[2, 5, 1], # Input: 2, Hidden: 5, Output: 1 grid=5, # Spline grid resolution k=3, # Spline order (cubic) ) # Training data x = torch.randn(1000, 2) y = torch.sin(x[:, 0]) + torch.cos(x[:, 1]) y = y.unsqueeze(1) # Train dataset = {"train_input": x[:800], "train_label": y[:800], "test_input": x[800:], "test_label": y[800:]} model.train(dataset, steps=100, lr=0.01) # Visualize learned functions model.plot() # Prune and simplify model = model.prune() model.plot()
KAN vs MLP Comparison
# Comparison on function approximation from kan import KAN import torch.nn as nn # KAN: learnable activations on edges kan_model = KAN(width=[2, 5, 1], grid=5, k=3) # Parameters: ~150 (spline coefficients) # MLP: fixed activations on nodes class MLP(nn.Module): def __init__(self): super().__init__() self.net = nn.Sequential( nn.Linear(2, 50), nn.ReLU(), nn.Linear(50, 50), nn.ReLU(), nn.Linear(50, 1), ) def forward(self, x): return self.net(x) mlp_model = MLP() # Parameters: ~2,700 # KAN advantages: # - Fewer parameters for same accuracy # - Interpretable (visualize learned functions) # - Better for scientific discovery (symbolic regression) # - Grid refinement for progressive accuracy # MLP advantages: # - Faster training # - Better scaling to high dimensions # - More mature tooling and optimization
Extensions and Variants
| Variant | Innovation | Application |
|---|---|---|
| KAN 2.0 | MultKAN with multiplication nodes | Improved scaling |
| Temporal KAN | Time-series adaptation | Forecasting |
| ConvKAN | KAN + convolutions | Image processing |
| GraphKAN | KAN on graph structures | Graph learning |
| FourierKAN | Fourier basis instead of splines | Periodic functions |
| WavKAN | Wavelet-based activations | Signal processing |
| BSRBF-KAN | B-spline + radial basis | Function approximation |
Scientific Applications
# KAN for symbolic regression (discovering equations) from kan import KAN # Generate data from unknown equation: f(x,y) = x*exp(y) import torch x = torch.rand(1000, 2) * 2 y = x[:, 0:1] * torch.exp(x[:, 1:2]) dataset = {"train_input": x[:800], "train_label": y[:800], "test_input": x[800:], "test_label": y[800:]} model = KAN(width=[2, 1, 1], grid=10, k=3) model.train(dataset, steps=200) # Symbolic fitting — discover the equation model.auto_symbolic() # Output: f(x₁, x₂) = x₁ * exp(x₂) # KAN can discover symbolic expressions from data
Research Landscape
### Key Research Directions 1. **Scaling** — Making KANs work at LLM scale 2. **Efficiency** — Reducing spline computation overhead 3. **Theory** — Understanding approximation guarantees 4. **Architecture search** — Optimal KAN topologies 5. **Hybrid models** — Combining KAN and MLP strengths 6. **Domain applications** — Physics, chemistry, biology 7. **Interpretability** — Extracting symbolic knowledge
Use Cases
- Scientific discovery: Extract equations from experimental data
- Function approximation: High-accuracy low-parameter models
- Interpretable ML: Understand what the network learned
- Physics-informed: Embed physical constraints in activations
- Education: Teach alternative neural network architectures