Awesome-Agent-Skills-for-Empirical-Research data-audit
Scans notebooks for data file references and verifies each file exists on disk. Use when checking for broken data paths.
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/29-quarcs-lab-project20XXy/dot-claude/skills/data-audit" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-data-audit && rm -rf "$T"
manifest:
skills/29-quarcs-lab-project20XXy/dot-claude/skills/data-audit/SKILL.mdsource content
Audit Data References
Scan all notebooks for data file references and verify they exist on disk.
Steps
-
Scan all
files in.ipynb
for data loading patterns:notebooks/- Python:
,pd.read_csv(...)
,pd.read_stata(...)
,pd.read_excel(...)
,pd.read_parquet(...)
,open(...)np.loadtxt(...) - R:
,read.csv(...)
,read_csv(...)
,read.dta(...)
,haven::read_dta(...)
,readxl::read_excel(...)load(...) - Stata:
,use "..."
,import delimited "..."
,import excel "..."insheet using "..." - Also check the
Jupytext pairs for the same patterns.md
- Python:
-
Extract every referenced file path and normalize it:
- Resolve relative paths from the notebook's directory (
)notebooks/ - Resolve paths using
,DATA_DIR
fromRAW_DATA_DIR
/config.pyconfig.R
- Resolve relative paths from the notebook's directory (
-
Check that each referenced file exists in
ordata/rawData/data/ -
Scan
anddata/rawData/
for all data files present on diskdata/ -
Report three categories:
Resolved — referenced and found:
- File path, which notebook references it, line/cell number
Broken — referenced but not found:
- File path as written in code, which notebook, suggested fix (closest matching file, or note that it may need to be downloaded)
Undocumented — on disk but never referenced by any notebook:
- File path in
ordata/rawData/
that no notebook loadsdata/
-
Print a summary: total references, resolved, broken, undocumented files
Error handling
- If no notebooks exist, report "No notebooks found" and stop.
- If
does not exist, warn but continue checkingdata/rawData/
.data/