Harness-ml domain-research

HarnessML: Domain Research

install

source · Clone the upstream repo

git clone https://github.com/msilverblatt/harness-ml

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/msilverblatt/harness-ml "$T" && mkdir -p ~/.claude/skills && cp -r "$T/packages/harness-plugin/skills/domain-research" ~/.claude/skills/msilverblatt-harness-ml-domain-research && rm -rf "$T"

manifest: packages/harness-plugin/skills/domain-research/SKILL.md

source content

HarnessML: Domain Research

Use when generating feature hypotheses from domain knowledge. This is not a one-time pre-work step — return here whenever results surprise you, progress stalls, or a new data source becomes available.

The Principle

The biggest ML gains come from features that capture real phenomena, not from hyperparameter tuning or model architecture changes. A feature that encodes domain knowledge — even imperfectly — gives the model information it cannot learn from the raw data alone.

Domain research is how you generate those features.

Generating Hypotheses

Start with the Domain, Not the Data

Before looking at correlations, ask: what does domain expertise say should matter?

Use web search, Wikipedia, academic literature, practitioner knowledge
Identify known predictive factors and the mechanisms behind them
Look for phenomena that are well-understood in the domain but not yet captured in your features

For Each Hypothesis, Document

What it is: The phenomenon in plain language
Why it should be predictive: The causal or correlational mechanism — not "it might help" but "it should help because..."
What data would capture it: Raw columns, derived signals, or external sources
Expected signal strength: Strong / Medium / Weak, with reasoning

Types of Domain Signals

Direct predictors — Features that directly measure the outcome driver.

Hospital readmission:
```
count_of_comorbidities
```
House price:
```
square_footage
```

Proxy signals — Indirect indicators when direct measurement is unavailable.

Financial distress:
```
days_payable_outstanding
```
(when cash flow data is missing)
Health risk:
```
pharmacy_visit_frequency
```
(when medical records are incomplete)

Interaction effects — Two features weak alone, strong together.

```
high_leverage * rising_rates
```
— leverage is fine until rates move
```
is_diabetic * high_bmi
```
— captures a specific high-risk population

Conditional effects — A feature that only matters in certain contexts.

```
marketing_spend
```
only predicts sales for products with existing brand awareness
```
rainfall
```
only affects crop yield during the growing season

Regime indicators — Signals that relationships change under different conditions.

```
vix_above_30
```
— volatility regime where correlations break down
```
product_lifecycle_stage
```
— growth vs maturity dynamics differ

Contrarian signals — Counter-intuitive predictive direction. Often the most valuable.

Higher satisfaction surveys sometimes predict turnover (dissatisfied employees don't respond)
More safety incidents can predict fewer fatalities (reporting culture catches problems early)

Mapping Hypotheses to Features

For each hypothesis:

Raw columns needed — what must exist or be sourced
Transformation — the formula or logic
Feature type: instance, grouped, interaction, ratio, indicator, regime
Novelty check — does an existing feature already capture this signal?

features(action="discover")

If an existing feature correlates >0.8 with your proposed feature, yours is likely redundant. Either skip it or refine the hypothesis to capture what the existing feature misses.

The Research Log

Maintain a running log. This is the connective tissue between domain knowledge and experiment results.

### Hypothesis: [Name]
- **Domain reasoning**: Why this should be predictive (the mechanism)
- **Source**: Where you found evidence
- **Feature(s)**: Name and formula
- **Expected signal**: Strong / Medium / Weak
- **Result**: What happened when tested
- **Learning**: What this tells us about the domain
- **Follow-up**: Next hypothesis generated by this result

The Follow-up field is the most important. Every tested hypothesis should generate at least one new question.

When to Return Here

Project start: Generate initial hypotheses before modeling
After surprising results: A model that fails where you expected it to succeed means your domain understanding is incomplete
When progress stalls: If experiments aren't teaching you anything new, you need new hypotheses from a different angle
New data source: Map new data through domain reasoning before feature engineering
After diagnosis reveals error patterns: If the model consistently fails on a subgroup, ask what domain knowledge explains that subgroup's behavior

Anti-Patterns

Blind feature generation: Adding features without a hypothesis for why they should work
Over-relying on auto-search:
```
features(action="auto_search")
```
finds statistical artifacts; domain reasoning finds real signals. Use auto-search as a supplement, not a replacement.
Stopping after initial research: Domain research is continuous. Your best hypotheses often come after seeing what the model gets wrong.
Ignoring negative results: A feature that doesn't work is information. Why didn't it work? Was the hypothesis wrong, or is the signal already captured?