git clone https://github.com/msilverblatt/harness-ml
T=$(mktemp -d) && git clone --depth=1 https://github.com/msilverblatt/harness-ml "$T" && mkdir -p ~/.claude/skills && cp -r "$T/packages/harness-plugin/skills/domain-research" ~/.claude/skills/msilverblatt-harness-ml-domain-research && rm -rf "$T"
packages/harness-plugin/skills/domain-research/SKILL.mdHarnessML: Domain Research
Use when generating feature hypotheses from domain knowledge. This is not a one-time pre-work step — return here whenever results surprise you, progress stalls, or a new data source becomes available.
The Principle
The biggest ML gains come from features that capture real phenomena, not from hyperparameter tuning or model architecture changes. A feature that encodes domain knowledge — even imperfectly — gives the model information it cannot learn from the raw data alone.
Domain research is how you generate those features.
Generating Hypotheses
Start with the Domain, Not the Data
Before looking at correlations, ask: what does domain expertise say should matter?
- Use web search, Wikipedia, academic literature, practitioner knowledge
- Identify known predictive factors and the mechanisms behind them
- Look for phenomena that are well-understood in the domain but not yet captured in your features
For Each Hypothesis, Document
- What it is: The phenomenon in plain language
- Why it should be predictive: The causal or correlational mechanism — not "it might help" but "it should help because..."
- What data would capture it: Raw columns, derived signals, or external sources
- Expected signal strength: Strong / Medium / Weak, with reasoning
Types of Domain Signals
Direct predictors — Features that directly measure the outcome driver.
- Hospital readmission:
count_of_comorbidities - House price:
square_footage
Proxy signals — Indirect indicators when direct measurement is unavailable.
- Financial distress:
(when cash flow data is missing)days_payable_outstanding - Health risk:
(when medical records are incomplete)pharmacy_visit_frequency
Interaction effects — Two features weak alone, strong together.
— leverage is fine until rates movehigh_leverage * rising_rates
— captures a specific high-risk populationis_diabetic * high_bmi
Conditional effects — A feature that only matters in certain contexts.
only predicts sales for products with existing brand awarenessmarketing_spend
only affects crop yield during the growing seasonrainfall
Regime indicators — Signals that relationships change under different conditions.
— volatility regime where correlations break downvix_above_30
— growth vs maturity dynamics differproduct_lifecycle_stage
Contrarian signals — Counter-intuitive predictive direction. Often the most valuable.
- Higher satisfaction surveys sometimes predict turnover (dissatisfied employees don't respond)
- More safety incidents can predict fewer fatalities (reporting culture catches problems early)
Mapping Hypotheses to Features
For each hypothesis:
- Raw columns needed — what must exist or be sourced
- Transformation — the formula or logic
- Feature type: instance, grouped, interaction, ratio, indicator, regime
- Novelty check — does an existing feature already capture this signal?
features(action="discover")
If an existing feature correlates >0.8 with your proposed feature, yours is likely redundant. Either skip it or refine the hypothesis to capture what the existing feature misses.
The Research Log
Maintain a running log. This is the connective tissue between domain knowledge and experiment results.
### Hypothesis: [Name] - **Domain reasoning**: Why this should be predictive (the mechanism) - **Source**: Where you found evidence - **Feature(s)**: Name and formula - **Expected signal**: Strong / Medium / Weak - **Result**: What happened when tested - **Learning**: What this tells us about the domain - **Follow-up**: Next hypothesis generated by this result
The Follow-up field is the most important. Every tested hypothesis should generate at least one new question.
When to Return Here
- Project start: Generate initial hypotheses before modeling
- After surprising results: A model that fails where you expected it to succeed means your domain understanding is incomplete
- When progress stalls: If experiments aren't teaching you anything new, you need new hypotheses from a different angle
- New data source: Map new data through domain reasoning before feature engineering
- After diagnosis reveals error patterns: If the model consistently fails on a subgroup, ask what domain knowledge explains that subgroup's behavior
Anti-Patterns
- Blind feature generation: Adding features without a hypothesis for why they should work
- Over-relying on auto-search:
finds statistical artifacts; domain reasoning finds real signals. Use auto-search as a supplement, not a replacement.features(action="auto_search") - Stopping after initial research: Domain research is continuous. Your best hypotheses often come after seeing what the model gets wrong.
- Ignoring negative results: A feature that doesn't work is information. Why didn't it work? Was the hypothesis wrong, or is the signal already captured?