Awesome-Agent-Skills-for-Empirical-Research codebook

Auto-generates a Markdown codebook from a dataset (CSV, DTA, Excel, Parquet) with types and summary statistics. Use when documenting variables.

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/29-quarcs-lab-project20XXy/dot-claude/skills/codebook" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-codebook && rm -rf "$T"

manifest: skills/29-quarcs-lab-project20XXy/dot-claude/skills/codebook/SKILL.md

source content

Generate Variable Codebook

Auto-generate a Markdown codebook documenting all variables in a dataset.

Arguments

$ARGUMENTS

— path to a dataset file (e.g.,

data/rawData/sample_data.csv

data/panel.dta

)

Steps

Determine the file format from the extension:
- ```
.csv
```
  — read with pandas
```
read_csv
```
- ```
.dta
```
  — read with pandas
```
read_stata
```
- ```
.xlsx
```
  /
```
.xls
```
  — read with pandas
```
read_excel
```
- ```
.parquet
```
  — read with pandas
```
read_parquet
```
- Other formats: ask the user how to load it
Load the dataset using
```
uv run python
```
and extract metadata for each variable:
- Variable name
- Data type (numeric, string, categorical, datetime)
- Non-missing count and missing count
- Number of unique values
- For numeric variables: min, max, mean, median, standard deviation
- For categorical/string variables: top 5 most frequent values with counts
- For datetime variables: min and max date
Generate a Markdown codebook with:
- Header: Dataset name, file path, number of observations, number of variables, date generated
- Summary table: Variable name | Type | Non-missing | Unique | Description (placeholder)
- Detailed sections per variable: Full statistics and a
```
[FILL: description]
```
  placeholder for the user to add a human-readable description

Derive the output filename from the dataset name:

data/rawData/sample_data.csv

→

references/sample-data-codebook.md

Save to
```
references/<dataset-name>-codebook.md
```
Report the file path and the number of variables documented.

Error handling

If the file does not exist, report the error and suggest checking the path.
If the file cannot be read (corrupt, unsupported format), report the error and ask for guidance.
Never modify the source data file. This command is read-only with respect to data.