Awesome-Agent-Skills-for-Empirical-Research codebook
Auto-generates a Markdown codebook from a dataset (CSV, DTA, Excel, Parquet) with types and summary statistics. Use when documenting variables.
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/29-quarcs-lab-project20XXy/dot-claude/skills/codebook" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-codebook && rm -rf "$T"
manifest:
skills/29-quarcs-lab-project20XXy/dot-claude/skills/codebook/SKILL.mdsource content
Generate Variable Codebook
Auto-generate a Markdown codebook documenting all variables in a dataset.
Arguments
— path to a dataset file (e.g.,$ARGUMENTS
,data/rawData/sample_data.csv
)data/panel.dta
Steps
-
Determine the file format from the extension:
— read with pandas.csvread_csv
— read with pandas.dtaread_stata
/.xlsx
— read with pandas.xlsread_excel
— read with pandas.parquetread_parquet- Other formats: ask the user how to load it
-
Load the dataset using
and extract metadata for each variable:uv run python- Variable name
- Data type (numeric, string, categorical, datetime)
- Non-missing count and missing count
- Number of unique values
- For numeric variables: min, max, mean, median, standard deviation
- For categorical/string variables: top 5 most frequent values with counts
- For datetime variables: min and max date
-
Generate a Markdown codebook with:
- Header: Dataset name, file path, number of observations, number of variables, date generated
- Summary table: Variable name | Type | Non-missing | Unique | Description (placeholder)
- Detailed sections per variable: Full statistics and a
placeholder for the user to add a human-readable description[FILL: description]
-
Derive the output filename from the dataset name:
→data/rawData/sample_data.csvreferences/sample-data-codebook.md
-
Save to
references/<dataset-name>-codebook.md -
Report the file path and the number of variables documented.
Error handling
- If the file does not exist, report the error and suggest checking the path.
- If the file cannot be read (corrupt, unsupported format), report the error and ask for guidance.
- Never modify the source data file. This command is read-only with respect to data.