Claude-skill-registry data-wrangler

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-wrangler" ~/.claude/skills/majiayu000-claude-skill-registry-data-wrangler && rm -rf "$T"

manifest: skills/data/data-wrangler/SKILL.md

Data Wrangler

Transform and export data using DuckDB SQL.

Usage - Command syntax and Windows escaping
Explore Mode - Quick data profiling
Query Mode - Return results to Claude
Write Mode - Export to files
Request/Response Format - JSON structure
Source Types - File, database, and cloud sources
Transformations - SQL patterns reference
Secrets - Secure credential handling

Usage

IMPORTANT - Windows Shell Escaping:

Always
```
cd
```
to the skill directory first
Use double quotes for echo with escaped inner quotes (
```
\"
```
)
Use forward slashes in file paths

cd "<skill_directory>" && echo "{\"query\": \"SELECT * FROM 'D:/path/to/file.csv'\"}" | uv run scripts/query_duckdb.py

Explore Mode

Get schema, statistics, and sample in one call. Use before writing queries to understand data structure.

{"mode": "explore", "path": "D:/data/sales.csv"}

Response:

{
  "file": "D:/data/sales.csv",
  "format": "csv",
  "row_count": 15234,
  "columns": [
    {"name": "order_id", "type": "BIGINT", "null_count": 0, "null_percent": 0.0},
    {"name": "customer", "type": "VARCHAR", "null_count": 45, "null_percent": 0.3}
  ],
  "sample": "| order_id | customer | ... |\\n|----------|----------|-----|\\n| 1001     | Alice    | ... |"
}

Options:

```
sample_rows
```
: Number of sample rows (default: 10, max: 100)
```
sources
```
: For database tables (same as query mode)

Query Mode

Return results directly to Claude for analysis.

Direct File Queries

{"query": "SELECT * FROM 'data.csv' LIMIT 10"}

Multi-Source Joins

{
  "query": "SELECT s.*, p.category FROM sales s JOIN products p ON s.product_id = p.id",
  "sources": [
    {"type": "file", "alias": "sales", "path": "/data/sales.parquet"},
    {"type": "file", "alias": "products", "path": "/data/products.csv"}
  ]
}

Write Mode

Export query results to files. Add an

output

object to write instead of returning data.

Basic Write

{
  "query": "SELECT * FROM 'raw.csv' WHERE status = 'active'",
  "output": {
    "path": "D:/output/filtered.parquet",
    "format": "parquet"
  }
}

Write with Options

{
  "query": "SELECT *, YEAR(date) as year, MONTH(date) as month FROM 'events.csv'",
  "output": {
    "path": "D:/output/events/",
    "format": "parquet",
    "options": {
      "compression": "zstd",
      "partition_by": ["year", "month"],
      "overwrite": true
    }
  }
}

Output Formats

Format Options

parquet

compression

(zstd/snappy/gzip/lz4),

partition_by

row_group_size

csv

header

(default: true),

delimiter

compression

partition_by

json

array

(true=JSON array, false=newline-delimited)

Write Response

Response includes verification info - no need for follow-up queries:

{
  "success": true,
  "output_path": "D:/output/events/",
  "format": "parquet",
  "rows_written": 15234,
  "files_created": ["D:/output/events/year=2023/data_0.parquet", "..."],
  "total_size_bytes": 5678901,
  "duration_ms": 1234
}

Overwrite Protection

By default, existing files are not overwritten. Set

options.overwrite: true

to allow.

Request/Response Format

Request

{
  "query": "SQL statement",
  "sources": [...],
  "output": {"path": "...", "format": "..."},
  "options": {"max_rows": 200, "format": "markdown"},
  "secrets_file": "path/to/secrets.yaml"
}

Query Mode Options

```
max_rows
```
: Maximum rows to return (default: 200)
```
max_bytes
```
: Maximum response size (default: 200000)
```
format
```
:
```
markdown
```
(default),
```
json
```
,
```
records
```
, or
```
csv
```

Query Mode Response (markdown)

| column1 | column2 |
|---|---|
| value1 | value2 |

Query Mode Response (json)

{
  "schema": [{"name": "col1", "type": "INTEGER"}],
  "rows": [[1, "value"]],
  "truncated": false,
  "warnings": [],
  "error": null
}

Source Types

File (auto-detects CSV, Parquet, JSON, Excel)

{"type": "file", "alias": "data", "path": "/path/to/file.csv"}

Glob patterns:

{"path": "/logs/**/*.parquet"}

Custom delimiter:

{"path": "/data/file.csv", "delimiter": "|"}

PostgreSQL

{
  "type": "postgres", "alias": "users",
  "host": "host", "port": 5432, "database": "db",
  "user": "user", "password": "pass",
  "schema": "public", "table": "users"
}

MySQL

{
  "type": "mysql", "alias": "orders",
  "host": "host", "port": 3306, "database": "db",
  "user": "user", "password": "pass", "table": "orders"
}

SQLite

{"type": "sqlite", "alias": "data", "path": "/path/to/db.sqlite", "table": "tablename"}

S3

{
  "type": "s3", "alias": "logs",
  "url": "s3://bucket/path/*.parquet",
  "aws_region": "us-east-1",
  "aws_access_key_id": "...", "aws_secret_access_key": "..."
}

Transformations

See TRANSFORMS.md for advanced patterns including:

PIVOT/UNPIVOT - Reshape data between wide and long formats
Sampling - Random subsets with
```
USING SAMPLE n ROWS
```
or
```
SAMPLE 10%
```
Dynamic columns -
```
EXCLUDE
```
,
```
REPLACE
```
,
```
COLUMNS('pattern')
```
Window functions - Running totals, rankings, moving averages
Date/time operations - Extraction, arithmetic, formatting

Quick Examples

-- PIVOT: Convert rows to columns
PIVOT sales ON quarter USING SUM(revenue) GROUP BY region

-- UNPIVOT: Convert columns to rows
UNPIVOT data ON q1, q2, q3, q4 INTO NAME quarter VALUE amount

-- Sampling: Random 10% with reproducible seed
SELECT * FROM large_table USING SAMPLE 10% REPEATABLE(42)

-- Dynamic columns: Exclude sensitive, transform email
SELECT * EXCLUDE (ssn) REPLACE (LOWER(email) AS email) FROM users

Workflow

Inspect schema:
```
DESCRIBE SELECT * FROM 'file.csv'
```
Preview data:
```
SELECT * FROM 'file.csv' LIMIT 5
```
Transform: Apply filters, joins, aggregations
Export (optional): Add
```
output
```
to write results

Error Handling

If
```
error
```
is non-null: Check column names, verify paths
If
```
truncated
```
is true: Use more aggregation or filters
If write fails with "exists": Set
```
options.overwrite: true
```

Secrets

Store credentials securely in YAML. See SECRETS.md for complete documentation.

{
  "query": "SELECT * FROM customers LIMIT 10",
  "secrets_file": "D:/path/to/secrets.yaml",
  "sources": [{
    "type": "postgres", "alias": "customers",
    "secret": "my_postgres", "table": "customers"
  }]
}

Supported: PostgreSQL, MySQL, S3, GCS, Azure, R2, HTTP, HuggingFace, Iceberg, DuckLake.