Asi duckdb-temporal-versioning
Temporal versioning and interaction history with time-travel queries,
git clone https://github.com/plurigrid/asi
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/duckdb-temporal-versioning" ~/.claude/skills/plurigrid-asi-duckdb-temporal-versioning-431cae && rm -rf "$T"
skills/duckdb-temporal-versioning/SKILL.mdDuckDB Temporal Versioning Skill
Status: ✅ Production Ready Trit: +1 (PLUS - generative/creating databases) Principle: Same data → Same structure (SPI guarantee) Implementation: DuckDB (embedded SQL) + Temporal semantics GF(3) Balanced Triad:
- duckdb-temporal-versioning (+1) [Generator: Create/write]
- clj-kondo-3color (-1) [Validator: Verify schemas]
- acsets (0) [Coordinator: Navigate relationships]
Overview
DuckDB Temporal Versioning provides in-process SQL database with time-travel semantics for storing and analyzing interaction history. Every database with the same schema and initial data produces identical query results, enabling:
- Time Travel: Query data as of any past transaction
- Deterministic Replay: Freeze snapshots for reproducible execution
- Causality Tracking: Track dependencies via vector clocks
- Immutable Audit Logs: Record all mutations permanently
- Multi-Layer Integration: Combine history + world databases + derived views
Core Capabilities
Time-Travel Semantics
-- Query data as of transaction T₁ SELECT * FROM interactions VERSION AT SYSTEM_TIME AS OF T₁; -- Get all versions of a row SELECT *, transaction_id FROM interactions FOR SYSTEM_TIME ALL WHERE id = 42;
Causality Tracking with Vector Clocks
-- Track causal relationships CREATE TABLE causality_log ( event_id INTEGER, vector_clock VARCHAR, -- e.g., "[1, 2, 3, 0]" event_data STRUCT, timestamp TIMESTAMP, UNIQUE(vector_clock) ); -- Query causality constraints SELECT * FROM causality_log WHERE vector_clock <= '[2, 3, 1, 1]' ORDER BY vector_clock;
Frozen Snapshots for Determinism
-- Create checkpoint PRAGMA freeze_table('interactions'); -- Later: All queries return identical results SELECT COUNT(*) FROM interactions; -- Always same value -- Unfreeze to add new data PRAGMA unfreeze_table('interactions');
Immutable Audit Logs (3-Trigger Pattern)
-- Trigger on INSERT CREATE TRIGGER audit_insert_interactions AFTER INSERT ON interactions FOR EACH ROW BEGIN INSERT INTO audit_log ( table_name, operation, record_id, new_value, timestamp ) VALUES ( 'interactions', 'INSERT', NEW.id, NEW, NOW() ); END; -- Trigger on UPDATE CREATE TRIGGER audit_update_interactions AFTER UPDATE ON interactions FOR EACH ROW BEGIN INSERT INTO audit_log ( table_name, operation, record_id, old_value, new_value, timestamp ) VALUES ( 'interactions', 'UPDATE', NEW.id, OLD, NEW, NOW() ); END; -- Trigger on DELETE CREATE TRIGGER audit_delete_interactions AFTER DELETE ON interactions FOR EACH ROW BEGIN INSERT INTO audit_log ( table_name, operation, record_id, old_value, timestamp ) VALUES ( 'interactions', 'DELETE', OLD.id, OLD, NOW() ); END;
Performance Optimization
Buffer & Memory Management
-- Set buffer pool PRAGMA threads = 4; PRAGMA memory_limit = '8GB'; -- Check current memory PRAGMA database_size;
Compression Strategy
-- Use SNAPPY compression (good for typical workloads) CREATE TABLE interactions ( id INTEGER, user_id VARCHAR, content VARCHAR, created_at TIMESTAMP, vector_clock VARCHAR ) WITH (compression='snappy');
Materialized Views for Expensive Queries
-- Cache aggregations CREATE MATERIALIZED VIEW interaction_stats AS SELECT DATE(created_at) as date, COUNT(*) as count, COUNT(DISTINCT user_id) as unique_users, AVG(LENGTH(content)) as avg_content_length FROM interactions GROUP BY DATE(created_at); -- Query cache instead of raw table SELECT * FROM interaction_stats WHERE date >= '2025-12-20';
Connection Pooling (Python)
from sqlalchemy import create_engine from sqlalchemy.pool import QueuePool # Use connection pool for concurrent access engine = create_engine( 'duckdb:////Users/bob/ies/music-topos/db.duckdb', poolclass=QueuePool, pool_size=5, max_overflow=10, pool_recycle=3600 ) # Connections reused efficiently with engine.connect() as conn: result = conn.execute("SELECT COUNT(*) FROM interactions")
Multi-Layer Query Strategy
Layer 1: History Queries (Fast)
-- Indexed queries on interaction history SELECT * FROM interactions WHERE user_id IN ('alice', 'bob') AND created_at > '2025-12-20' LIMIT 100;
Layer 2: World Transformations
-- Feature extraction from raw interactions SELECT id, user_id, EXTRACT(HOUR FROM created_at) as hour_of_day, LENGTH(content) as content_length, (SELECT COUNT(*) FROM interactions i2 WHERE i2.user_id = i1.user_id) as user_interaction_count FROM interactions i1;
Layer 3: Unified Integration View
-- Join history + world databases SELECT h.id, h.user_id, h.content, h.created_at, w.hour_of_day, w.content_length, w.user_interaction_count FROM interactions h JOIN interaction_features w ON h.id = w.id WHERE h.created_at >= '2025-12-20' ORDER BY h.created_at DESC;
Refinement Query Pattern
-- Stage 1: BROAD query (understand scale) SELECT COUNT(*) as total_interactions FROM interactions; -- Result: 10,000 -- Stage 2: TRIFURCATED (split into 3) SELECT 'technical-innovation' as pattern, COUNT(*) as count, AVG(content_length) as avg_length FROM interactions WHERE content LIKE '%algorithm%' OR content LIKE '%optimization%' UNION ALL SELECT 'collaborative-work' as pattern, COUNT(*) as count, AVG(content_length) as avg_length FROM interactions WHERE content LIKE '%team%' OR content LIKE '%coordination%' UNION ALL SELECT 'other' as pattern, COUNT(*) as count, AVG(content_length) as avg_length FROM interactions WHERE content NOT LIKE '%algorithm%' AND content NOT LIKE '%team%'; -- Stage 3: REFINED (focus on interesting subset) SELECT * FROM interactions WHERE content LIKE '%algorithm%' AND content_length > 100 AND user_id NOT IN (SELECT id FROM bots) ORDER BY created_at DESC LIMIT 50;
Backup & Recovery
Full Backup
# Create backup duckdb /Users/bob/ies/music-topos/db.duckdb \ ".backup /Users/bob/ies/music-topos/db.duckdb.backup" # Verify backup size ls -lh /Users/bob/ies/music-topos/db.duckdb*
Incremental Backup (Copy Strategy)
# Copy database file with timestamp cp /Users/bob/ies/music-topos/db.duckdb \ /Users/bob/ies/music-topos/db.duckdb.backup.$(date +%Y%m%d_%H%M%S) # Keep last 7 backups ls -t /Users/bob/ies/music-topos/db.duckdb.backup.* | \ tail -n +8 | \ xargs rm -f
Integrity Check
# Verify database integrity duckdb /Users/bob/ies/music-topos/db.duckdb \ "PRAGMA integrity_check;"
Restore from Backup
# Restore from backup duckdb /Users/bob/ies/music-topos/db.duckdb \ ".restore /Users/bob/ies/music-topos/db.duckdb.backup" # Verify restoration duckdb /Users/bob/ies/music-topos/db.duckdb.backup \ "SELECT COUNT(*) FROM interactions;"
Data Export for Analysis
Export to CSV
-- Export all interactions COPY interactions TO '/tmp/interactions.csv' WITH (FORMAT CSV); -- Export with header COPY interactions TO '/tmp/interactions.csv' WITH (FORMAT CSV, HEADER TRUE); -- Export filtered subset COPY ( SELECT id, user_id, content_length, created_at FROM interactions WHERE created_at >= '2025-12-20' ) TO '/tmp/recent_interactions.csv' WITH (FORMAT CSV, HEADER TRUE);
Export to JSON
-- Export to JSON COPY interactions TO '/tmp/interactions.json' WITH (FORMAT JSON); -- Export nested structure COPY ( SELECT id, user_id, content, created_at, struct_pack( vector_clock := vector_clock, transaction_id := transaction_id ) as metadata FROM interactions ) TO '/tmp/interactions_with_metadata.json' WITH (FORMAT JSON);
Export to Parquet
-- Export to Parquet (columnar, efficient) COPY interactions TO '/tmp/interactions.parquet' WITH (FORMAT PARQUET);
API: SQL Interface
Python
import duckdb # Connect to database conn = duckdb.connect('/Users/bob/ies/music-topos/db.duckdb') # Execute query result = conn.execute( "SELECT COUNT(*) FROM interactions" ).fetchall() # Parameterized query (safe) rows = conn.execute( "SELECT * FROM interactions WHERE user_id = ? LIMIT 10", ['alice'] ).fetchdf() # Returns pandas DataFrame # Insert data conn.execute( "INSERT INTO interactions (user_id, content, created_at) VALUES (?, ?, ?)", ['bob', 'Hello world', '2025-12-21 10:00:00'] ) conn.close()
Clojure
(require '[duckdb.core :as db]) ;; Connect (def conn (db/connect "db.duckdb")) ;; Query (db/query conn "SELECT COUNT(*) FROM interactions") ;; Insert (db/execute! conn "INSERT INTO interactions (user_id, content) VALUES (?, ?)" ["alice" "Hello"]) ;; Time-travel query (db/query conn "SELECT * FROM interactions VERSION AT SYSTEM_TIME AS OF ?1" [transaction-id]) ;; Close (db/close conn)
Ruby
require 'duckdb' # Connect db = DuckDB::Database.new('db.duckdb') # Query result = db.execute( "SELECT COUNT(*) FROM interactions" ) # Insert db.execute( "INSERT INTO interactions (user_id, content) VALUES (?, ?)", ['bob', 'Hello world'] ) # Parameterized rows = db.execute( "SELECT * FROM interactions WHERE user_id = ?", ['alice'] )
GF(3) Balanced Integration
Triadic Skill Loading
duckdb-temporal-versioning (+1) Generator: Create/write databases - initialize_database: Set up schema - insert_events: Add interactions - snapshot_checkpoint: Freeze state clj-kondo-3color (-1) Validator: Verify schemas - validate_schema: Check table definitions - integrity_check: PRAGMA integrity_check - audit_log_verify: Verify mutation log acsets (0) Coordinator: Navigate relationships - query_relationships: Find foreign keys - walk_dependency_graph: Traverse schema - export_schema: Generate documentation
Example: GF(3) Conservation
;; Load balanced triad (load-skills [:duckdb-temporal-versioning ;; +1 (create) :clj-kondo-3color ;; -1 (validate) :acsets]) ;; 0 (coordinate) ;; GF(3) sum: (+1) + (-1) + (0) = 0 ✓
Monitoring & Maintenance
Statistics View
-- Create monitoring view CREATE VIEW database_stats AS SELECT COUNT(*) as total_records, COUNT(DISTINCT user_id) as unique_users, MIN(created_at) as oldest_record, MAX(created_at) as newest_record, DATEDIFF(DAY, MIN(created_at), MAX(created_at)) as days_spanned, AVG(LENGTH(content)) as avg_content_length FROM interactions; -- Query stats SELECT * FROM database_stats;
Audit Log Analysis
-- Track mutations SELECT table_name, operation, COUNT(*) as operation_count, MIN(timestamp) as first_op, MAX(timestamp) as last_op, MAX(timestamp) - MIN(timestamp) as duration FROM audit_log GROUP BY table_name, operation;
Commands & Integration
Justfile Integration
# Backup database just backup-db # Restore from backup just restore-db backup_file=db.duckdb.backup.20251221 # Export for analysis just export-db format=csv # Check integrity just db-integrity-check # Run time-travel query just db-time-travel transaction_id=T₁
CLI Usage
# Open DuckDB shell duckdb /Users/bob/ies/music-topos/db.duckdb # Run query from command line duckdb /Users/bob/ies/music-topos/db.duckdb \ "SELECT COUNT(*) FROM interactions" # Load SQL file duckdb /Users/bob/ies/music-topos/db.duckdb < schema.sql
Use Cases in Music-Topos
Phase 1: Data Acquisition
- Store raw interactions from Bluesky, GitHub, Zulip
- Track provenance with audit logs
- Enable time-travel to any point in history
Phase 2: Colorable Music Topos
- Store entropy measurements per interaction
- Record leitmotif assignments with confidence scores
- Track color mappings to HSV space
Phase 3: 5D Pattern Extraction
- Persist feature vectors (39-49 dimensions)
- Store temporal, topic, interaction, learning, network patterns
- Enable multi-layer queries (history → features → unified view)
Phase 4: Agent-o-rama Training
- Trace Storage: Each training epoch → database record
- Time-Travel Checkpoints: Restore to best epoch
- Audit Log: Record all LLM calls, skill loads, color assignments
- Causality: Vector clocks track dependencies between decisions
- Provenance: Full lineage from raw data to trained surrogate
Best Practices Summary
- Performance: Set memory_limit=8GB, use SNAPPY compression, index frequently-queried columns
- Temporal: Track vector clocks, use frozen snapshots, maintain immutable audit logs
- Integration: Use multi-layer queries (history → world → unified), apply refinement pattern
- Reliability: Regular backups, integrity checks, automated archival
- Analysis: Export CSV/JSON/Parquet, maintain statistics views, track mutations
Example: Complete Workflow
import duckdb from datetime import datetime # Initialize db = duckdb.connect('music_topos.duckdb') # Create schema with audit triggers db.execute(""" CREATE TABLE IF NOT EXISTS interactions ( id INTEGER PRIMARY KEY, user_id VARCHAR, content VARCHAR, created_at TIMESTAMP, vector_clock VARCHAR, transaction_id INTEGER ) WITH (compression='snappy') """) # Insert interaction db.execute( "INSERT INTO interactions (user_id, content, created_at, vector_clock) " "VALUES (?, ?, ?, ?)", ['alice', 'Technical innovation in algorithms', datetime.now(), '[1, 0, 0]'] ) # Query time-travel historical = db.execute( "SELECT * FROM interactions VERSION AT SYSTEM_TIME AS OF ?", [1] ).fetchdf() # Export for analysis db.execute( "COPY interactions TO 'interactions.csv' WITH (FORMAT CSV, HEADER TRUE)" ) db.close()
Skill Name: duckdb-temporal-versioning Type: Temporal Database with Time-Travel Queries Trit: +1 (PLUS - generative) GF(3): Balanced with clj-kondo-3color (-1) + acsets (0) SPI: Guaranteed (same schema + data → same results) Applications: Data acquisition, feature storage, training trace logging, audit trails