Qsv data-describe

Generate AI-powered Data Dictionary, Description, and Tags for a CSV/TSV/Excel file

install
source · Clone the upstream repo
git clone https://github.com/dathere/qsv
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/dathere/qsv "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skills/data-describe" ~/.claude/skills/dathere-qsv-data-describe && rm -rf "$T"
manifest: .claude/skills/skills/data-describe/SKILL.md
source content

Data Describe

Generate AI-powered documentation for a tabular data file using

describegpt
. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).

Cowork note: If relative paths don't resolve, call

mcp__qsv__qsv_get_working_dir
and
mcp__qsv__qsv_set_working_dir
to sync the working directory.

Steps

  1. Index: Run

    mcp__qsv__qsv_index
    on the file for fast random access.

  2. Profile: Run

    mcp__qsv__qsv_stats
    with
    cardinality: true, stats_jsonl: true
    to generate the stats cache. describegpt reads this cache for column metadata, so it must exist first.

  3. Describe: Run

    mcp__qsv__qsv_describegpt
    with the requested options (recommend
    all: true
    for comprehensive output). At least one inference option (
    dictionary
    ,
    description
    ,
    tags
    , or
    all
    ) is required. Output defaults to
    <filestem>.describegpt.md
    .

  4. Present: Display the generated Data Dictionary table, Description, and Tags to the user.

Options

OptionEffect
--all
(recommended)
Generate Dictionary + Description + Tags in one pass
--dictionary
Data Dictionary only — column labels, descriptions, types
--description
Natural-language dataset Description only
--tags
Semantic Tags only
--format
Output format:
Markdown
(default),
JSON
,
TSV
,
TOON
--language
Generate output in a non-English language (e.g.
Spanish
,
French
)
--addl-cols-list
Enrich the dictionary with extra columns (e.g.
"everything"
,
"moar!"
)
--tag-vocab
Constrain tags to a controlled vocabulary (comma-separated)
--num-tags
Number of tags to generate (default: 5)
--num-examples
Number of example values per column in the dictionary
--enum-threshold
Max cardinality to treat a column as an enum in the dictionary

Notes

  • No API key needed in MCP mode — uses the connected LLM automatically via MCP sampling
  • The stats cache must exist first for best results (step 2 creates it)
  • Output defaults to
    <filestem>.describegpt.md
  • For Excel/JSONL files, the MCP server auto-converts to CSV first
  • Use
    --format JSON
    when you need machine-readable output for downstream processing
  • Use
    --language
    to generate documentation in the user's preferred language