Ruflo AgentDB Learning Plugins
Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.
git clone https://github.com/ruvnet/ruflo
T=$(mktemp -d) && git clone --depth=1 https://github.com/ruvnet/ruflo "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/agentdb-learning" ~/.claude/skills/ruvnet-ruflo-agentdb-learning-plugins && rm -rf "$T"
.agents/skills/agentdb-learning/SKILL.mdAgentDB Learning Plugins
What This Skill Does
Provides access to 9 reinforcement learning algorithms via AgentDB's plugin system. Create, train, and deploy learning plugins for autonomous agents that improve through experience. Includes offline RL (Decision Transformer), value-based learning (Q-Learning), policy gradients (Actor-Critic), and advanced techniques.
Performance: Train models 10-100x faster with WASM-accelerated neural inference.
Prerequisites
- Node.js 18+
- AgentDB v1.0.7+ (via agentic-flow)
- Basic understanding of reinforcement learning (recommended)
Quick Start with CLI
Create Learning Plugin
# Interactive wizard npx agentdb@latest create-plugin # Use specific template npx agentdb@latest create-plugin -t decision-transformer -n my-agent # Preview without creating npx agentdb@latest create-plugin -t q-learning --dry-run # Custom output directory npx agentdb@latest create-plugin -t actor-critic -o .$plugins
List Available Templates
# Show all plugin templates npx agentdb@latest list-templates # Available templates: # - decision-transformer (sequence modeling RL - recommended) # - q-learning (value-based learning) # - sarsa (on-policy TD learning) # - actor-critic (policy gradient with baseline) # - curiosity-driven (exploration-based)
Manage Plugins
# List installed plugins npx agentdb@latest list-plugins # Get plugin information npx agentdb@latest plugin-info my-agent # Shows: algorithm, configuration, training status
Quick Start with API
import { createAgentDBAdapter } from 'agentic-flow$reasoningbank'; // Initialize with learning enabled const adapter = await createAgentDBAdapter({ dbPath: '.agentdb$learning.db', enableLearning: true, // Enable learning plugins enableReasoning: true, cacheSize: 1000, }); // Store training experience await adapter.insertPattern({ id: '', type: 'experience', domain: 'game-playing', pattern_data: JSON.stringify({ embedding: await computeEmbedding('state-action-reward'), pattern: { state: [0.1, 0.2, 0.3], action: 2, reward: 1.0, next_state: [0.15, 0.25, 0.35], done: false } }), confidence: 0.9, usage_count: 1, success_count: 1, created_at: Date.now(), last_used: Date.now(), }); // Train learning model const metrics = await adapter.train({ epochs: 50, batchSize: 32, }); console.log('Training Loss:', metrics.loss); console.log('Duration:', metrics.duration, 'ms');
Available Learning Algorithms (9 Total)
1. Decision Transformer (Recommended)
Type: Offline Reinforcement Learning Best For: Learning from logged experiences, imitation learning Strengths: No online interaction needed, stable training
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
Use Cases:
- Learn from historical data
- Imitation learning from expert demonstrations
- Safe learning without environment interaction
- Sequence modeling tasks
Configuration:
{ "algorithm": "decision-transformer", "model_size": "base", "context_length": 20, "embed_dim": 128, "n_heads": 8, "n_layers": 6 }
2. Q-Learning
Type: Value-Based RL (Off-Policy) Best For: Discrete action spaces, sample efficiency Strengths: Proven, simple, works well for small$medium problems
npx agentdb@latest create-plugin -t q-learning -n q-agent
Use Cases:
- Grid worlds, board games
- Navigation tasks
- Resource allocation
- Discrete decision-making
Configuration:
{ "algorithm": "q-learning", "learning_rate": 0.001, "gamma": 0.99, "epsilon": 0.1, "epsilon_decay": 0.995 }
3. SARSA
Type: Value-Based RL (On-Policy) Best For: Safe exploration, risk-sensitive tasks Strengths: More conservative than Q-Learning, better for safety
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
Use Cases:
- Safety-critical applications
- Risk-sensitive decision-making
- Online learning with exploration
Configuration:
{ "algorithm": "sarsa", "learning_rate": 0.001, "gamma": 0.99, "epsilon": 0.1 }
4. Actor-Critic
Type: Policy Gradient with Value Baseline Best For: Continuous actions, variance reduction Strengths: Stable, works for continuous$discrete actions
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
Use Cases:
- Continuous control (robotics, simulations)
- Complex action spaces
- Multi-agent coordination
Configuration:
{ "algorithm": "actor-critic", "actor_lr": 0.001, "critic_lr": 0.002, "gamma": 0.99, "entropy_coef": 0.01 }
5. Active Learning
Type: Query-Based Learning Best For: Label-efficient learning, human-in-the-loop Strengths: Minimizes labeling cost, focuses on uncertain samples
Use Cases:
- Human feedback incorporation
- Label-efficient training
- Uncertainty sampling
- Annotation cost reduction
6. Adversarial Training
Type: Robustness Enhancement Best For: Safety, robustness to perturbations Strengths: Improves model robustness, adversarial defense
Use Cases:
- Security applications
- Robust decision-making
- Adversarial defense
- Safety testing
7. Curriculum Learning
Type: Progressive Difficulty Training Best For: Complex tasks, faster convergence Strengths: Stable learning, faster convergence on hard tasks
Use Cases:
- Complex multi-stage tasks
- Hard exploration problems
- Skill composition
- Transfer learning
8. Federated Learning
Type: Distributed Learning Best For: Privacy, distributed data Strengths: Privacy-preserving, scalable
Use Cases:
- Multi-agent systems
- Privacy-sensitive data
- Distributed training
- Collaborative learning
9. Multi-Task Learning
Type: Transfer Learning Best For: Related tasks, knowledge sharing Strengths: Faster learning on new tasks, better generalization
Use Cases:
- Task families
- Transfer learning
- Domain adaptation
- Meta-learning
Training Workflow
1. Collect Experiences
// Store experiences during agent execution for (let i = 0; i < numEpisodes; i++) { const episode = runEpisode(); for (const step of episode.steps) { await adapter.insertPattern({ id: '', type: 'experience', domain: 'task-domain', pattern_data: JSON.stringify({ embedding: await computeEmbedding(JSON.stringify(step)), pattern: { state: step.state, action: step.action, reward: step.reward, next_state: step.next_state, done: step.done } }), confidence: step.reward > 0 ? 0.9 : 0.5, usage_count: 1, success_count: step.reward > 0 ? 1 : 0, created_at: Date.now(), last_used: Date.now(), }); } }
2. Train Model
// Train on collected experiences const trainingMetrics = await adapter.train({ epochs: 100, batchSize: 64, learningRate: 0.001, validationSplit: 0.2, }); console.log('Training Metrics:', trainingMetrics); // { // loss: 0.023, // valLoss: 0.028, // duration: 1523, // epochs: 100 // }
3. Evaluate Performance
// Retrieve similar successful experiences const testQuery = await computeEmbedding(JSON.stringify(testState)); const result = await adapter.retrieveWithReasoning(testQuery, { domain: 'task-domain', k: 10, synthesizeContext: true, }); // Evaluate action quality const suggestedAction = result.memories[0].pattern.action; const confidence = result.memories[0].similarity; console.log('Suggested Action:', suggestedAction); console.log('Confidence:', confidence);
Advanced Training Techniques
Experience Replay
// Store experiences in buffer const replayBuffer = []; // Sample random batch for training const batch = sampleRandomBatch(replayBuffer, batchSize: 32); // Train on batch await adapter.train({ data: batch, epochs: 1, batchSize: 32, });
Prioritized Experience Replay
// Store experiences with priority (TD error) await adapter.insertPattern({ // ... standard fields confidence: tdError, // Use TD error as confidence$priority // ... }); // Retrieve high-priority experiences const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, { domain: 'task-domain', k: 32, minConfidence: 0.7, // Only high TD-error experiences });
Multi-Agent Training
// Collect experiences from multiple agents for (const agent of agents) { const experience = await agent.step(); await adapter.insertPattern({ // ... store experience with agent ID domain: `multi-agent/${agent.id}`, }); } // Train shared model await adapter.train({ epochs: 50, batchSize: 64, });
Performance Optimization
Batch Training
// Collect batch of experiences const experiences = collectBatch(size: 1000); // Batch insert (500x faster) for (const exp of experiences) { await adapter.insertPattern({ /* ... */ }); } // Train on batch await adapter.train({ epochs: 10, batchSize: 128, // Larger batch for efficiency });
Incremental Learning
// Train incrementally as new data arrives setInterval(async () => { const newExperiences = getNewExperiences(); if (newExperiences.length > 100) { await adapter.train({ epochs: 5, batchSize: 32, }); } }, 60000); // Every minute
Integration with Reasoning Agents
Combine learning with reasoning for better performance:
// Train learning model await adapter.train({ epochs: 50, batchSize: 32 }); // Use reasoning agents for inference const result = await adapter.retrieveWithReasoning(queryEmbedding, { domain: 'decision-making', k: 10, useMMR: true, // Diverse experiences synthesizeContext: true, // Rich context optimizeMemory: true, // Consolidate patterns }); // Make decision based on learned experiences + reasoning const decision = result.context.suggestedAction; const confidence = result.memories[0].similarity;
CLI Operations
# Create plugin npx agentdb@latest create-plugin -t decision-transformer -n my-plugin # List plugins npx agentdb@latest list-plugins # Get plugin info npx agentdb@latest plugin-info my-plugin # List templates npx agentdb@latest list-templates
Troubleshooting
Issue: Training not converging
// Reduce learning rate await adapter.train({ epochs: 100, batchSize: 32, learningRate: 0.0001, // Lower learning rate });
Issue: Overfitting
// Use validation split await adapter.train({ epochs: 50, batchSize: 64, validationSplit: 0.2, // 20% validation }); // Enable memory optimization await adapter.retrieveWithReasoning(queryEmbedding, { optimizeMemory: true, // Consolidate, reduce overfitting });
Issue: Slow training
# Enable quantization for faster inference # Use binary quantization (32x faster)
Learn More
- Algorithm Papers: See docs$algorithms/ for detailed papers
- GitHub: https:/$github.com$ruvnet$agentic-flow$tree$main$packages$agentdb
- MCP Integration:
npx agentdb@latest mcp - Website: https:/$agentdb.ruv.io
Category: Machine Learning / Reinforcement Learning Difficulty: Intermediate to Advanced Estimated Time: 30-60 minutes