Asi AgentDB Learning Plugins
Create and train RL learning plugins with AgentDB's plugin system.
install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agentdb-learning-plugins" ~/.claude/skills/plurigrid-asi-agentdb-learning-plugins && rm -rf "$T"
manifest:
skills/agentdb-learning-plugins/SKILL.mdsource content
AgentDB Learning Plugins
CLI Quick Start
# Interactive wizard npx agentdb@latest create-plugin # Use specific template npx agentdb@latest create-plugin -t decision-transformer -n my-agent # Preview without creating npx agentdb@latest create-plugin -t q-learning --dry-run # Custom output directory npx agentdb@latest create-plugin -t actor-critic -o ./plugins # List available templates npx agentdb@latest list-templates # List installed plugins npx agentdb@latest list-plugins # Get plugin info npx agentdb@latest plugin-info my-agent
API Quick Start
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank'; const adapter = await createAgentDBAdapter({ dbPath: '.agentdb/learning.db', enableLearning: true, enableReasoning: true, cacheSize: 1000, });
Algorithm Templates and Configs
1. Decision Transformer (Recommended)
Offline RL -- learns from logged experiences without online interaction.
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
{ "algorithm": "decision-transformer", "model_size": "base", "context_length": 20, "embed_dim": 128, "n_heads": 8, "n_layers": 6 }
2. Q-Learning
Off-policy, value-based. Best for discrete action spaces.
npx agentdb@latest create-plugin -t q-learning -n q-agent
{ "algorithm": "q-learning", "learning_rate": 0.001, "gamma": 0.99, "epsilon": 0.1, "epsilon_decay": 0.995 }
3. SARSA
On-policy, value-based. More conservative than Q-Learning -- better for safety-critical tasks.
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
{ "algorithm": "sarsa", "learning_rate": 0.001, "gamma": 0.99, "epsilon": 0.1 }
4. Actor-Critic
Policy gradient with value baseline. Works for continuous and discrete action spaces.
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
{ "algorithm": "actor-critic", "actor_lr": 0.001, "critic_lr": 0.002, "gamma": 0.99, "entropy_coef": 0.01 }
5. Curiosity-Driven
npx agentdb@latest create-plugin -t curiosity-driven -n curious-agent
Templates 5-9
Also available via
list-templates: active-learning, adversarial-training, curriculum-learning, federated-learning, multi-task-learning. These have no dedicated CLI template flag -- use the interactive wizard (create-plugin with no -t).
Training Workflow
Store Experiences
await adapter.insertPattern({ id: '', type: 'experience', domain: 'task-domain', pattern_data: JSON.stringify({ embedding: await computeEmbedding(JSON.stringify(step)), pattern: { state: step.state, action: step.action, reward: step.reward, next_state: step.next_state, done: step.done, }, }), confidence: step.reward > 0 ? 0.9 : 0.5, usage_count: 1, success_count: step.reward > 0 ? 1 : 0, created_at: Date.now(), last_used: Date.now(), });
Train
const metrics = await adapter.train({ epochs: 100, batchSize: 64, learningRate: 0.001, validationSplit: 0.2, }); // Returns: { loss, valLoss, duration, epochs }
Evaluate
const result = await adapter.retrieveWithReasoning(testQuery, { domain: 'task-domain', k: 10, synthesizeContext: true, }); const suggestedAction = result.memories[0].pattern.action; const confidence = result.memories[0].similarity;
Prioritized Experience Replay
// Store with TD error as priority await adapter.insertPattern({ // ... standard fields confidence: tdError, // TD error = priority }); // Retrieve only high-priority experiences const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, { domain: 'task-domain', k: 32, minConfidence: 0.7, });
Multi-Agent Training
for (const agent of agents) { const experience = await agent.step(); await adapter.insertPattern({ domain: `multi-agent/${agent.id}`, // ... experience data }); } await adapter.train({ epochs: 50, batchSize: 64 });
Combined Learning + Reasoning
await adapter.train({ epochs: 50, batchSize: 32 }); const result = await adapter.retrieveWithReasoning(queryEmbedding, { domain: 'decision-making', k: 10, useMMR: true, synthesizeContext: true, optimizeMemory: true, });
Troubleshooting
Not converging: Lower
learningRate (try 0.0001).
Overfitting: Add
validationSplit: 0.2, enable optimizeMemory: true to consolidate patterns.
Slow training: Enable quantization (
quantizationType: 'binary').