Claude-skill-registry data-wrangler
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-wrangler" ~/.claude/skills/majiayu000-claude-skill-registry-data-wrangler && rm -rf "$T"
manifest:
skills/data/data-wrangler/SKILL.mdsource content
Data Wrangler
Transform and export data using DuckDB SQL.
Contents
- Usage - Command syntax and Windows escaping
- Explore Mode - Quick data profiling
- Query Mode - Return results to Claude
- Write Mode - Export to files
- Request/Response Format - JSON structure
- Source Types - File, database, and cloud sources
- Transformations - SQL patterns reference
- Secrets - Secure credential handling
Usage
IMPORTANT - Windows Shell Escaping:
- Always
to the skill directory firstcd - Use double quotes for echo with escaped inner quotes (
)\" - Use forward slashes in file paths
cd "<skill_directory>" && echo "{\"query\": \"SELECT * FROM 'D:/path/to/file.csv'\"}" | uv run scripts/query_duckdb.py
Explore Mode
Get schema, statistics, and sample in one call. Use before writing queries to understand data structure.
{"mode": "explore", "path": "D:/data/sales.csv"}
Response:
{ "file": "D:/data/sales.csv", "format": "csv", "row_count": 15234, "columns": [ {"name": "order_id", "type": "BIGINT", "null_count": 0, "null_percent": 0.0}, {"name": "customer", "type": "VARCHAR", "null_count": 45, "null_percent": 0.3} ], "sample": "| order_id | customer | ... |\\n|----------|----------|-----|\\n| 1001 | Alice | ... |" }
Options:
: Number of sample rows (default: 10, max: 100)sample_rows
: For database tables (same as query mode)sources
Query Mode
Return results directly to Claude for analysis.
Direct File Queries
{"query": "SELECT * FROM 'data.csv' LIMIT 10"}
Multi-Source Joins
{ "query": "SELECT s.*, p.category FROM sales s JOIN products p ON s.product_id = p.id", "sources": [ {"type": "file", "alias": "sales", "path": "/data/sales.parquet"}, {"type": "file", "alias": "products", "path": "/data/products.csv"} ] }
Write Mode
Export query results to files. Add an
output object to write instead of returning data.
Basic Write
{ "query": "SELECT * FROM 'raw.csv' WHERE status = 'active'", "output": { "path": "D:/output/filtered.parquet", "format": "parquet" } }
Write with Options
{ "query": "SELECT *, YEAR(date) as year, MONTH(date) as month FROM 'events.csv'", "output": { "path": "D:/output/events/", "format": "parquet", "options": { "compression": "zstd", "partition_by": ["year", "month"], "overwrite": true } } }
Output Formats
| Format | Options |
|---|---|
| (zstd/snappy/gzip/lz4), , |
| (default: true), , , |
| (true=JSON array, false=newline-delimited) |
Write Response
Response includes verification info - no need for follow-up queries:
{ "success": true, "output_path": "D:/output/events/", "format": "parquet", "rows_written": 15234, "files_created": ["D:/output/events/year=2023/data_0.parquet", "..."], "total_size_bytes": 5678901, "duration_ms": 1234 }
Overwrite Protection
By default, existing files are not overwritten. Set
options.overwrite: true to allow.
Request/Response Format
Request
{ "query": "SQL statement", "sources": [...], "output": {"path": "...", "format": "..."}, "options": {"max_rows": 200, "format": "markdown"}, "secrets_file": "path/to/secrets.yaml" }
Query Mode Options
: Maximum rows to return (default: 200)max_rows
: Maximum response size (default: 200000)max_bytes
:format
(default),markdown
,json
, orrecordscsv
Query Mode Response (markdown)
| column1 | column2 | |---|---| | value1 | value2 |
Query Mode Response (json)
{ "schema": [{"name": "col1", "type": "INTEGER"}], "rows": [[1, "value"]], "truncated": false, "warnings": [], "error": null }
Source Types
File (auto-detects CSV, Parquet, JSON, Excel)
{"type": "file", "alias": "data", "path": "/path/to/file.csv"}
Glob patterns:
{"path": "/logs/**/*.parquet"}
Custom delimiter:
{"path": "/data/file.csv", "delimiter": "|"}
PostgreSQL
{ "type": "postgres", "alias": "users", "host": "host", "port": 5432, "database": "db", "user": "user", "password": "pass", "schema": "public", "table": "users" }
MySQL
{ "type": "mysql", "alias": "orders", "host": "host", "port": 3306, "database": "db", "user": "user", "password": "pass", "table": "orders" }
SQLite
{"type": "sqlite", "alias": "data", "path": "/path/to/db.sqlite", "table": "tablename"}
S3
{ "type": "s3", "alias": "logs", "url": "s3://bucket/path/*.parquet", "aws_region": "us-east-1", "aws_access_key_id": "...", "aws_secret_access_key": "..." }
Transformations
See TRANSFORMS.md for advanced patterns including:
- PIVOT/UNPIVOT - Reshape data between wide and long formats
- Sampling - Random subsets with
orUSING SAMPLE n ROWSSAMPLE 10% - Dynamic columns -
,EXCLUDE
,REPLACECOLUMNS('pattern') - Window functions - Running totals, rankings, moving averages
- Date/time operations - Extraction, arithmetic, formatting
Quick Examples
-- PIVOT: Convert rows to columns PIVOT sales ON quarter USING SUM(revenue) GROUP BY region -- UNPIVOT: Convert columns to rows UNPIVOT data ON q1, q2, q3, q4 INTO NAME quarter VALUE amount -- Sampling: Random 10% with reproducible seed SELECT * FROM large_table USING SAMPLE 10% REPEATABLE(42) -- Dynamic columns: Exclude sensitive, transform email SELECT * EXCLUDE (ssn) REPLACE (LOWER(email) AS email) FROM users
Workflow
- Inspect schema:
DESCRIBE SELECT * FROM 'file.csv' - Preview data:
SELECT * FROM 'file.csv' LIMIT 5 - Transform: Apply filters, joins, aggregations
- Export (optional): Add
to write resultsoutput
Error Handling
- If
is non-null: Check column names, verify pathserror - If
is true: Use more aggregation or filterstruncated - If write fails with "exists": Set
options.overwrite: true
Secrets
Store credentials securely in YAML. See SECRETS.md for complete documentation.
{ "query": "SELECT * FROM customers LIMIT 10", "secrets_file": "D:/path/to/secrets.yaml", "sources": [{ "type": "postgres", "alias": "customers", "secret": "my_postgres", "table": "customers" }] }
Supported: PostgreSQL, MySQL, S3, GCS, Azure, R2, HTTP, HuggingFace, Iceberg, DuckLake.