AlterLab-Academic-Skills alterlab-pufferlib
Part of the AlterLab Academic Skills suite. High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
git clone https://github.com/AlterLab-IEU/AlterLab-Academic-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/AlterLab-IEU/AlterLab-Academic-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-science/alterlab-pufferlib" ~/.claude/skills/alterlab-ieu-alterlab-academic-skills-alterlab-pufferlib && rm -rf "$T"
skills/data-science/alterlab-pufferlib/SKILL.mdPufferLib - High-Performance Reinforcement Learning
Overview
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.
When to Use This Skill
Use this skill when:
- Training RL agents with PPO on any environment (single or multi-agent)
- Creating custom environments using the PufferEnv API
- Optimizing performance for parallel environment simulation (vectorization)
- Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
- Developing policies with CNN, LSTM, or custom architectures
- Scaling RL to millions of steps per second for faster experimentation
- Multi-agent RL with native multi-agent environment support
Core Capabilities
1. High-Performance Training (PuffeRL)
PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.
Quick start training:
# CLI training puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4 # Distributed training torchrun --nproc_per_node=4 train.py
Python training loop:
import pufferlib from pufferlib import PuffeRL # Create vectorized environment env = pufferlib.make('procgen-coinrun', num_envs=256) # Create trainer trainer = PuffeRL( env=env, policy=my_policy, device='cuda', learning_rate=3e-4, batch_size=32768 ) # Training loop for iteration in range(num_iterations): trainer.evaluate() # Collect rollouts trainer.train() # Train on batch trainer.mean_and_log() # Log results
For comprehensive training guidance, read
references/training.md for:
- Complete training workflow and CLI options
- Hyperparameter tuning with Protein
- Distributed multi-GPU/multi-node training
- Logger integration (Weights & Biases, Neptune)
- Checkpointing and resume training
- Performance optimization tips
- Curriculum learning patterns
2. Environment Development (PufferEnv)
Create custom high-performance environments with the PufferEnv API.
Basic environment structure:
import numpy as np from pufferlib import PufferEnv class MyEnvironment(PufferEnv): def __init__(self, buf=None): super().__init__(buf) # Define spaces self.observation_space = self.make_space((4,)) self.action_space = self.make_discrete(4) self.reset() def reset(self): # Reset state and return initial observation return np.zeros(4, dtype=np.float32) def step(self, action): # Execute action, compute reward, check done obs = self._get_observation() reward = self._compute_reward() done = self._is_done() info = {} return obs, reward, done, info
Use the template script:
scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:
- Different observation space types (vector, image, dict)
- Action space variations (discrete, continuous, multi-discrete)
- Multi-agent environment structure
- Testing utilities
For complete environment development, read
references/environments.md for:
- PufferEnv API details and in-place operation patterns
- Observation and action space definitions
- Multi-agent environment creation
- Ocean suite (20+ pre-built environments)
- Performance optimization (Python to C workflow)
- Environment wrappers and best practices
- Debugging and validation techniques
3. Vectorization and Performance
Achieve maximum throughput with optimized parallel simulation.
Vectorization setup:
import pufferlib # Automatic vectorization env = pufferlib.make('environment_name', num_envs=256, num_workers=8) # Performance benchmarks: # - Pure Python envs: 100k-500k SPS # - C-based envs: 100M+ SPS # - With training: 400k-4M total SPS
Key optimizations:
- Shared memory buffers for zero-copy observation passing
- Busy-wait flags instead of pipes/queues
- Surplus environments for async returns
- Multiple environments per worker
For vectorization optimization, read
references/vectorization.md for:
- Architecture and performance characteristics
- Worker and batch size configuration
- Serial vs multiprocessing vs async modes
- Shared memory and zero-copy patterns
- Hierarchical vectorization for large scale
- Multi-agent vectorization strategies
- Performance profiling and troubleshooting
4. Policy Development
Build policies as standard PyTorch modules with optional utilities.
Basic policy structure:
import torch.nn as nn from pufferlib.pytorch import layer_init class Policy(nn.Module): def __init__(self, observation_space, action_space): super().__init__() # Encoder self.encoder = nn.Sequential( layer_init(nn.Linear(obs_dim, 256)), nn.ReLU(), layer_init(nn.Linear(256, 256)), nn.ReLU() ) # Actor and critic heads self.actor = layer_init(nn.Linear(256, num_actions), std=0.01) self.critic = layer_init(nn.Linear(256, 1), std=1.0) def forward(self, observations): features = self.encoder(observations) return self.actor(features), self.critic(features)
For complete policy development, read
references/policies.md for:
- CNN policies for image observations
- Recurrent policies with optimized LSTM (3x faster inference)
- Multi-input policies for complex observations
- Continuous action policies
- Multi-agent policies (shared vs independent parameters)
- Advanced architectures (attention, residual)
- Observation normalization and gradient clipping
- Policy debugging and testing
5. Environment Integration
Seamlessly integrate environments from popular RL frameworks.
Gymnasium integration:
import gymnasium as gym import pufferlib # Wrap Gymnasium environment gym_env = gym.make('CartPole-v1') env = pufferlib.emulate(gym_env, num_envs=256) # Or use make directly env = pufferlib.make('gym-CartPole-v1', num_envs=256)
PettingZoo multi-agent:
# Multi-agent environment env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
Supported frameworks:
- Gymnasium / OpenAI Gym
- PettingZoo (parallel and AEC)
- Atari (ALE)
- Procgen
- NetHack / MiniHack
- Minigrid
- Neural MMO
- Crafter
- GPUDrive
- MicroRTS
- Griddly
- And more...
For integration details, read
references/integration.md for:
- Complete integration examples for each framework
- Custom wrappers (observation, reward, frame stacking, action repeat)
- Space flattening and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Integration debugging
Quick Start Workflow
For Training Existing Environments
- Choose environment from Ocean suite or compatible framework
- Use
as starting pointscripts/train_template.py - Configure hyperparameters for your task
- Run training with CLI or Python script
- Monitor with Weights & Biases or Neptune
- Refer to
for optimizationreferences/training.md
For Creating Custom Environments
- Start with
scripts/env_template.py - Define observation and action spaces
- Implement
andreset()
methodsstep() - Test environment locally
- Vectorize with
orpufferlib.emulate()make() - Refer to
for advanced patternsreferences/environments.md - Optimize with
if neededreferences/vectorization.md
For Policy Development
- Choose architecture based on observations:
- Vector observations → MLP policy
- Image observations → CNN policy
- Sequential tasks → LSTM policy
- Complex observations → Multi-input policy
- Use
for proper weight initializationlayer_init - Follow patterns in
references/policies.md - Test with environment before full training
For Performance Optimization
- Profile current throughput (steps per second)
- Check vectorization configuration (num_envs, num_workers)
- Optimize environment code (in-place ops, numpy vectorization)
- Consider C implementation for critical paths
- Use
for systematic optimizationreferences/vectorization.md
Resources
scripts/
train_template.py - Complete training script template with:
- Environment creation and configuration
- Policy initialization
- Logger integration (WandB, Neptune)
- Training loop with checkpointing
- Command-line argument parsing
- Multi-GPU distributed training setup
env_template.py - Environment implementation templates:
- Single-agent PufferEnv example (grid world)
- Multi-agent PufferEnv example (cooperative navigation)
- Multiple observation/action space patterns
- Testing utilities
references/
training.md - Comprehensive training guide:
- Training workflow and CLI options
- Hyperparameter configuration
- Distributed training (multi-GPU, multi-node)
- Monitoring and logging
- Checkpointing
- Protein hyperparameter tuning
- Performance optimization
- Common training patterns
- Troubleshooting
environments.md - Environment development guide:
- PufferEnv API and characteristics
- Observation and action spaces
- Multi-agent environments
- Ocean suite environments
- Custom environment development workflow
- Python to C optimization path
- Third-party environment integration
- Wrappers and best practices
- Debugging
vectorization.md - Vectorization optimization:
- Architecture and key optimizations
- Vectorization modes (serial, multiprocessing, async)
- Worker and batch configuration
- Shared memory and zero-copy patterns
- Advanced vectorization (hierarchical, custom)
- Multi-agent vectorization
- Performance monitoring and profiling
- Troubleshooting and best practices
policies.md - Policy architecture guide:
- Basic policy structure
- CNN policies for images
- LSTM policies with optimization
- Multi-input policies
- Continuous action policies
- Multi-agent policies
- Advanced architectures (attention, residual)
- Observation processing and unflattening
- Initialization and normalization
- Debugging and testing
integration.md - Framework integration guide:
- Gymnasium integration
- PettingZoo integration (parallel and AEC)
- Third-party environments (Procgen, NetHack, Minigrid, etc.)
- Custom wrappers (observation, reward, frame stacking, etc.)
- Space conversion and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Debugging integration
Tips for Success
-
Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments
-
Profile early: Measure steps per second from the start to identify bottlenecks
-
Use templates:
andscripts/train_template.py
provide solid starting pointsscripts/env_template.py -
Read references as needed: Each reference file is self-contained and focused on a specific capability
-
Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed
-
Leverage vectorization: PufferLib's vectorization is key to achieving high throughput
-
Monitor training: Use WandB or Neptune to track experiments and identify issues early
-
Test environments: Validate environment logic before scaling up training
-
Check existing environments: Ocean suite provides 20+ pre-built environments
-
Use proper initialization: Always use
fromlayer_init
for policiespufferlib.pytorch
Common Use Cases
Training on Standard Benchmarks
# Atari env = pufferlib.make('atari-pong', num_envs=256) # Procgen env = pufferlib.make('procgen-coinrun', num_envs=256) # Minigrid env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
Multi-Agent Learning
# PettingZoo env = pufferlib.make('pettingzoo-pistonball', num_envs=128) # Shared policy for all agents policy = create_policy(env.observation_space, env.action_space) trainer = PuffeRL(env=env, policy=policy)
Custom Task Development
# Create custom environment class MyTask(PufferEnv): # ... implement environment ... # Vectorize and train env = pufferlib.emulate(MyTask, num_envs=256) trainer = PuffeRL(env=env, policy=my_policy)
High-Performance Optimization
# Maximize throughput env = pufferlib.make( 'my-env', num_envs=1024, # Large batch num_workers=16, # Many workers envs_per_worker=64 # Optimize per worker )
Installation
uv pip install pufferlib
Documentation
- Official docs: https://puffer.ai/docs.html
- GitHub: https://github.com/PufferAI/PufferLib
- Discord: Community support available