Autosearch autosearch:trace-harvest

Distill reusable knowledge from successful session tool-call traces. Reads runtime tool-call logs + Evidence outputs + user acceptance signals, filters successful paths, and writes compact patterns to per-skill experience/patterns.jsonl. Differs from outcome-tracker (which only records downstream acceptance) by parsing the full trace structure — inputs, tool choices, branches, recoveries.

install

source · Clone the upstream repo

git clone https://github.com/0xmariowu/Autosearch

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/0xmariowu/Autosearch "$T" && mkdir -p ~/.claude/skills && cp -r "$T/autosearch/skills/meta/trace-harvest" ~/.claude/skills/0xmariowu-autosearch-autosearch-trace-harvest && rm -rf "$T"

manifest: autosearch/skills/meta/trace-harvest/SKILL.md

source content

Trace Harvest — Distill Successful Sessions

MiroThinker's

collect-trace

+ DeepResearchAgent's

general_memory_system

patterns adapted for autosearch. Analyzes a session's tool-call trace, identifies what worked, and turns it into promote-candidate patterns for

experience-capture

to store.

Input

A completed autosearch session's trace:

trace:
  session_id: str
  query: str
  clarify_result: ClarifyResult (from run_clarify)
  tool_calls: list[{name, args, result_summary, success, latency_ms, cost}]
  final_answer: str | null
  user_feedback: "accepted" | "rejected" | "unknown" | null
  rubrics_passed: list[str]
  rubrics_failed: list[str]

Selection Rules

Only harvest traces where ANY of:

```
user_feedback == "accepted"
```
(explicit positive).

len(rubrics_passed) >= 0.6 * (rubrics_passed + rubrics_failed)

(objective quality gate).

Session finished within budget AND generated >= 5 unique evidence citations.

Reject traces where:

User rejected the final answer.
Session hit a budget-exhausted error.
More than 3 tool calls failed consecutively.

Harvest Output

Per successful trace, emit patterns keyed by leaf skill:

{
  "ts": "...",
  "session_id": "...",
  "skill": "search-xiaohongshu",
  "winning_pattern": "brand + pain_word + recent_date_window",
  "good_query": "品牌 痛点词 近30天",
  "context": {
    "task_domain": "product-research",
    "clarify_had_rubrics": 4,
    "subsequent_channels_used": ["search-douyin", "search-zhihu"]
  },
  "metrics": {
    "relevant_out_of_returned": "9/18",
    "led_to_user_acceptance": true
  },
  "promote_candidate": true,
  "trace_ref": "traces/<session_id>.jsonl"
}

Each emitted pattern is appended via

experience-capture

to the relevant skill's

patterns.jsonl

. Trace-harvest does NOT write

experience.md

directly — that's

experience-compact

's job.

When Run

At session end (if session had
```
user_feedback
```
signal).
Batch mode: process the trace archive nightly, harvest successful sessions from the last 24h.

Anti-Pattern Filters

Single tool call leading to answer → not a pattern, do not harvest (too narrow).
Tool call sequence exactly matched the router's default recommendation → not interesting; harvest only when the trace diverges from defaults and still succeeds.
Query was trivially answerable by Claude's parametric knowledge → reject (we are trying to learn autosearch patterns, not Claude patterns).

Privacy / Hygiene

Sanitize query text before storing: replace specific identifiers (names, emails, IDs) with placeholders (
```
{user_name}
```
,
```
{issue_id}
```
).
Never record cookies / API keys even transitively from tool-call args.
Truncate tool-call result snippets to the 500-char summary; long text belongs in session trace archives, not in patterns.jsonl.

Related Skills

Consumes ← session trace files + user feedback events.
Feeds →
```
experience-capture
```
(append pattern events).
Complements ←
```
outcome-tracker
```
(downstream acceptance) and
```
experience-compact
```
(promotion).
Does NOT replace them — trace-harvest is trace-structure analysis; outcome-tracker is result-use analysis.

Quality Bar

Evidence items have non-empty title and url.
No crash on empty or malformed API response.
Source channel field matches the channel name.