Qsv data-describe
Generate AI-powered Data Dictionary, Description, and Tags for a CSV/TSV/Excel file
install
source · Clone the upstream repo
git clone https://github.com/dathere/qsv
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/dathere/qsv "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skills/data-describe" ~/.claude/skills/dathere-qsv-data-describe && rm -rf "$T"
manifest:
.claude/skills/skills/data-describe/SKILL.mdsource content
Data Describe
Generate AI-powered documentation for a tabular data file using
describegpt. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).
Cowork note: If relative paths don't resolve, call
andmcp__qsv__qsv_get_working_dirto sync the working directory.mcp__qsv__qsv_set_working_dir
Steps
-
Index: Run
on the file for fast random access.mcp__qsv__qsv_index -
Profile: Run
withmcp__qsv__qsv_stats
to generate the stats cache. describegpt reads this cache for column metadata, so it must exist first.cardinality: true, stats_jsonl: true -
Describe: Run
with the requested options (recommendmcp__qsv__qsv_describegpt
for comprehensive output). At least one inference option (all: true
,dictionary
,description
, ortags
) is required. Output defaults toall
.<filestem>.describegpt.md -
Present: Display the generated Data Dictionary table, Description, and Tags to the user.
Options
| Option | Effect |
|---|---|
(recommended) | Generate Dictionary + Description + Tags in one pass |
| Data Dictionary only — column labels, descriptions, types |
| Natural-language dataset Description only |
| Semantic Tags only |
| Output format: (default), , , |
| Generate output in a non-English language (e.g. , ) |
| Enrich the dictionary with extra columns (e.g. , ) |
| Constrain tags to a controlled vocabulary (comma-separated) |
| Number of tags to generate (default: 5) |
| Number of example values per column in the dictionary |
| Max cardinality to treat a column as an enum in the dictionary |
Notes
- No API key needed in MCP mode — uses the connected LLM automatically via MCP sampling
- The stats cache must exist first for best results (step 2 creates it)
- Output defaults to
<filestem>.describegpt.md - For Excel/JSONL files, the MCP server auto-converts to CSV first
- Use
when you need machine-readable output for downstream processing--format JSON - Use
to generate documentation in the user's preferred language--language