Awesome-Agent-Skills-for-Empirical-Research identify

Design or review identification strategy for the sewage-house-prices project. Produces strategy memos with estimand, assumptions, pseudo-code, robustness plan, falsification tests, and referee objection anticipation. This skill should be used when asked to "design the strategy", "identify the effect", "write a strategy memo", or "think through identification".

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/41-sticerd-eee-sewage-econometrics-check/skills/identify" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-identify && rm -rf "$T"
manifest: skills/41-sticerd-eee-sewage-econometrics-check/skills/identify/SKILL.md
source content

Identification Strategy

Design or review an identification strategy for the sewage-house-prices project.

Input:

$ARGUMENTS
— a research question, approach name (e.g. "hedonic", "dry spills"), or "review existing" to audit all current strategies.


Project-Specific Context

Existing Strategies

  1. Hedonic — Cross-sectional:
    log(price) ~ spill_metrics + controls | lsoa + year_quarter
    . Assumption: spill exposure is conditionally exogenous given LSOA FE.
  2. Repeat sales — Within-property:
    Δlog(price) ~ Δspill_metrics | house_id
    . Eliminates time-invariant unobservables.
  3. Long difference — Grid-level: changes in average prices within 250m grids. Eliminates level differences.
  4. News/media DiD — Treatment = post-media-coverage × exposure. Tests whether information matters for capitalisation.
  5. Upstream/downstream — River network topology via PostGIS. Downstream sites receive upstream pollution. Tests directionality.
  6. Dry spills — Spills without rainfall. If dry spills affect prices, suggests awareness/stigma channel over physical damage.
  7. Hydraulic capacity instrument — Planned IV using sewer capacity as instrument for spill frequency.

Key Data Features

  • EDM data: 2021-2024+, high-frequency (event-level)
  • Land Registry: universe of transactions
  • Zoopla: rental listings
  • Met Office: daily rainfall at LSOA level
  • River networks: PostGIS topology
  • Treatment radii: 250m, 500m, 1000m, 2000m, 5000m, 10000m

Workflow

Step 1: Context Gathering

  1. Read existing manuscript sections in
    docs/overleaf/
    for how strategies are currently described
  2. Read relevant analysis scripts in
    scripts/R/09_analysis/
  3. Read
    scripts/R/utils/spill_aggregation_utils.R
    for treatment construction
  4. Check
    docs/overleaf/refs.bib
    for methodological references

Step 2: Strategy Development

For a new or revised strategy, produce:

  1. Strategy memo — Design choice, estimand (ATT/ATE/LATE), key assumptions, comparison group
  2. Estimating equation — LaTeX-formatted with clear variable definitions
  3. Pseudo-code — Implementation sketch (what the R code will do)
  4. Robustness plan — Ordered list with rationale:
    • Radius sensitivity (250m → 10km)
    • Time period variation (prior period vs full period)
    • Alternative treatment measures (count vs hours vs binary)
    • Subsample analysis (sales vs rentals, urban vs rural)
  5. Falsification tests — What SHOULD NOT show effects and why
  6. Referee objection anticipation — Top 5 objections with pre-emptive responses

Step 3: Strategy Review

If reviewing an existing strategy:

Phase 1: Claim Identification

  • What is the claimed design?
  • What is the estimand?
  • What is the treatment / comparison?

Phase 2: Core Design Validity

  • Are identifying assumptions stated and defensible?
  • Are the biggest threats acknowledged?
  • Does the specification match the stated design?

Phase 3: Robustness Assessment

  • Does the robustness plan address the right concerns?
  • Are falsification tests well-chosen?
  • Is there radius sensitivity analysis?

Step 4: Present Results

# Identification Strategy: [Approach]
**Date:** YYYY-MM-DD
**Design:** [Hedonic / Repeat Sales / Long Diff / DiD / IV / etc.]
**Estimand:** [ATT / ATE / LATE]

## Strategy Summary
[2-3 sentence description]

## Estimating Equation
$$\log(p_{it}) = \alpha + \beta \cdot \text{SpillMetric}_{it} + \gamma X_{it} + \mu_i + \delta_t + \varepsilon_{it}$$

## Key Assumptions
1. [Assumption 1] — [defense]
2. [Assumption 2] — [defense]

## Assessment: [SOUND / CONCERNS / CRITICAL ISSUES]

## Robustness Plan (ordered)
1. [Most important check]
2. [Second check]
...

## Falsification Tests
1. [Test 1] — [expected null and why]

## Anticipated Referee Objections
1. [Objection] — [Response]

## Next Steps
- [ ] Implement main specification
- [ ] Run falsification tests
- [ ] Generate pre-trend evidence

Save to

output/log/strategy_memo_[approach].md
.


Principles

  • Catch problems before coding. A flawed strategy caught now saves weeks of wasted analysis.
  • Multiple strategies are strength. This paper uses 6+ approaches — consistency across them is the key argument.
  • Cross-reference approaches. Each strategy should address threats the others cannot.
  • The user decides. Present trade-offs, don't make choices unilaterally.
  • Strategy memo is the contract. Once approved, analysis scripts implement it faithfully.