research-assist

A lightweight arXiv literature digest skill for OpenClaw, with Zotero-driven interest profiling, research-map plus Zotero-semantic ranking, agent-enriched digest cards, and non-destructive feedback writeback.

install

source · Clone the upstream repo

git clone https://github.com/zhanglg12/research-assist

Claude Code · Install into ~/.claude/skills/

git clone --depth=1 https://github.com/zhanglg12/research-assist ~/.claude/skills/zhanglg12-research-assist-research-assist

manifest: SKILL.md

source content

Research Assist Skill

An OpenClaw skill that turns Zotero evidence into a profile-driven literature digest, using arXiv by default and optionally expanding recall with OpenAlex and Semantic Scholar before deduplication.

CLI Usage

# Full digest: profile check → literature retrieval → rank → markdown output
uv run --project ~/.openclaw/skills/research-assist \
  research-assist --action digest --config ~/.openclaw/skills/research-assist/config.json

# Ad-hoc literature search
uv run --project ~/.openclaw/skills/research-assist \
  research-assist --action search --query "gaussian process" --top 5

# Check profile refresh status
uv run --project ~/.openclaw/skills/research-assist \
  research-assist --action profile-refresh --config ~/.openclaw/skills/research-assist/config.json

# Zotero MCP server (for profile evidence + feedback writeback)
uv run --project ~/.openclaw/skills/research-assist research-assist-zotero-mcp

Or via Python module:

uv run --project ~/.openclaw/skills/research-assist \
  python -m codex_research_assist --action digest --config ~/.openclaw/skills/research-assist/config.json

Default config path:

~/.openclaw/skills/research-assist/config.json

During install or reconfiguration, do not embed the full setup questionnaire in this file. Use

references/setup-routing.md

as the install-time interaction guide, ask only the questions relevant to the user's goal, then edit

config.json

directly.

Install-Time Behavior

Installation and reconfiguration are one-time operations.

Hard rules for the host agent:

use
```
references/setup-routing.md
```
only when the user is installing, reconfiguring, or when required config is missing
once
```
config.json
```
is valid, normal digest/search/render/feedback runs must not reopen setup questions
do not restate dormant install options during regular literature work
if the user asks for normal runtime work, prefer using the existing config over discussing installation
when setup selects optional backends or delivery routes, execute the required install/setup commands instead of only listing them
before leaving setup, run a minimal verification for the selected backend or route and report the result
only fall back to manual instructions when a step is blocked by missing secrets, missing local services, missing permissions, or a platform limitation

Config Format

{
  "profile_path": "~/.openclaw/skills/research-assist/profiles/research-interest.json",
  "output_root": "~/.openclaw/skills/research-assist/reports",
  "retrieval_defaults": {
    "max_results_per_interest": 20,
    "since_days": 7,
    "max_age_days": 7
  }
}

Architecture

config.json (OpenClaw skill config)
    ↓
openclaw_runner.py (CLI entry, markdown to stdout)
    ├── profile_refresh_policy  → check if profile needs update
    ├── pipeline.py             → multi-source literature retrieval (arXiv default)
    ├── ranker.py               → two-signal scoring (map_match + zotero_semantic)
    └── format_*_markdown()     → structured markdown output

No LLM calls inside the packaged Python pipeline. Retrieval, ranking, and formatting are pure data operations. Intelligence comes from the calling agent (OpenClaw / Claude Code / Codex CLI).

Profile refresh should be handled by the OpenClaw controller or agent layer, using live Zotero evidence via the bundled Zotero MCP.

Workflow Stages

profile_update

read the current Zotero evidence base when refresh is required
maintain
```
profiles/research-interest.json
```

preserve the compact contract:

method_keywords

query_aliases

exclude_keywords

keep method labels short and retrieval-friendly
prefer
```
zotero_semantic_search
```
for discovery, then
```
zotero_search_items
```
for exact resolution
use
```
research-assist-zotero-mcp
```
for live Zotero reads (no direct API calls)

OpenClaw generation rule:

treat Zotero like a studio palette, not a flat folder dump
use collection structure as the sketch of the research map
use representative papers as the main evidence for what each region actually contains
use semantic search as the blending layer that connects nearby themes across collections
write interests that feel like stable method axes, not loose keyword bags
aim for about 6 interests by default; usually stay in the 4-8 range unless the evidence strongly says otherwise
if the draft has too few interests, split mixed regions by real method differences; if it has too many, merge nearby regions that share one stable method axis
if collection names and paper content disagree, trust the papers more than the folder label
if summary terms are frequent but too generic, use them only to refine wording, not to define the map
the final profile should read like a compact map of the user's research territory: a few clear regions, each with short labels and retrieval-friendly aliases

retrieval

query arXiv by default, optionally add OpenAlex and Semantic Scholar per interest
expand the paper pool before ranking when multiple sources are enabled
generate structured candidate JSON with full provenance
deduplicate across interests and across enabled sources

review

rank candidates with two-signal scoring:
- map_match (0.30): how well the paper fits the current research-map slices
- zotero_semantic (0.70): how close the paper is to nearby Zotero literature
apply the low-map guard:
- if
```
map_match < 0.30
```
  , apply the configured penalty to avoid semantic-only false positives
output ranked markdown to stdout for agent review
prefer a smaller sharper set over a noisy dump
stay
```
abstract-first
```

Digest Enrichment

OpenClaw should treat agent-filled review as the default digest-enrichment path
after retrieval, let the host agent enrich the top-ranked candidate JSON files with review patches
use
```
review_generation.agent_top_n
```
to cap how many ranked candidates the host agent needs to inspect
let the host agent decide the final visible subset by setting
```
review.selected_for_digest
```
use
```
review_generation.final_top_n
```
as the hard upper bound for the final rendered digest
keep
```
fallback_to_system
```
enabled unless the user explicitly wants hard failure instead of fallback text
after patches are applied, re-render the digest so HTML / Telegram outputs use the enriched review text
```
why_it_matters
```
should sound like a recommendation, not a provenance report
```
caveats
```
should capture real uncertainty or scope boundaries, not generic hedging
the host agent should also fill
```
review.zotero_comparison
```
, including nearest-neighbor fallback when candidate-level evidence is missing
keep nearest-neighbor output compact: usually 1-2 items
the host agent is not responsible for email / telegram wrapper copy, subjects, or routing

Delivery Routing

use one shared delivery path and branch at the end with
```
delivery.primary_channel
```
default primary channel is
```
email
```
;
```
telegram
```
is backup or alternate primary
channel wrappers are system-owned:
- email subject/body/profile card/stat cards
- telegram compact message shell
do not ask the host agent to generate channel-specific wrappers

Stage 6:

feedback_sync

After the digest is reviewed and delivered, the host agent may push non-destructive feedback back into Zotero.

Workflow:

collect the user's explicit feedback on each digest candidate (keep, drop, archive, watch, etc.)

encode feedback as

reports/schema/zotero-feedback.schema.json

call
```
zotero_apply_feedback
```
through the bundled Zotero MCP with
```
dry_run=true
```
first
show the dry-run plan to the user and ask for confirmation before applying
only after confirmation, re-run with
```
dry_run=false
```

Allowed feedback decisions:

```
read_first
```
— high-priority paper, tag and promote in library
```
skim
```
— worth scanning, tag for later review
```
watch
```
— track this topic area, add to watchlist collection
```
skip_for_now
```
— not relevant now, mark but do not remove
```
archive
```
— reviewed and filed, move to archive collection
```
watchlist
```
— add to a standing watchlist for periodic check-in
```
ignore
```
— not relevant, tag to suppress in future runs
```
unset
```
— no decision yet, skip writeback for this item

What feedback can do (non-destructive only):

add tags (including
```
ra-status:*
```
decision tags)
add or change collection membership
append notes to items
create new collections if needed for organization

What feedback must never do:

delete Zotero items or collections
modify item metadata (title, authors, abstract, DOI)
move or delete attachment files
rewrite top-level taxonomy without explicit user instruction
apply changes without showing the dry-run plan first

Matching behavior:

match items by
```
item_key
```
(preferred),
```
doi
```
, or
```
title_contains
```
at least one match field must be provided per decision
DOI matching is case-insensitive
```
title_contains
```
uses substring match (not exact)
if no match is found, the decision is recorded as
```
not_found
```
in the plan

Edge cases:

duplicate tags are deduplicated (case-insensitive)
previous
```
ra-status:*
```
tags are replaced when a new decision is applied
the
```
research-assist
```
system tag is always preserved
```
unset
```
decisions produce no status tag and no writeback

empty

add_tags

remove_tags

add_collections

remove_collections

are allowed (no-op for that field)

Hard Rules

do not expand concise method labels into long topic sentences
do not make full text the default review mode
do not delete Zotero items or collections automatically
prefer
```
dry_run=true
```
for any Zotero writeback
do not treat scheduler wiring as part of the skill

Key Runtime Files

OpenClaw runner:

src/codex_research_assist/openclaw_runner.py

Ranker:
```
src/codex_research_assist/ranker.py
```

Pipeline:

src/codex_research_assist/arxiv_profile_pipeline/pipeline.py

Example config:
```
config.example.json
```
Example profile:
```
profiles/research-interest.example.json
```

Reference Documents

```
references/workflow.md
```
— stage order and controller boundary
```
references/contracts.md
```
— profile contract and review policy
```
references/distribution.md
```
— packaging include/exclude rules
```
references/setup-routing.md
```
— install-time route selection and option questions

references/review-generation.md

—

system

agent_fill

review contract

```
references/profile-map-generation.md
```
— how to turn Zotero evidence into a research-map-style profile

Packaging Boundary

Include in distributable skill:

SKILL.md

config.example.json

pyproject.toml

uv.lock

```
src/
```
```
references/
```
```
profiles/research-interest.example.json
```

automation/arxiv-profile-digest.example.toml

```
automation/prompts/
```
```
reports/schema/
```
generated package-root
```
install.sh
```

Exclude:

generated reports, temporary state
local secret config
scheduler wrappers
repository planning documents (
```
NEXT_PLAN.md
```
,
```
CODEMAP.md
```
)

research-assist

Research Assist Skill

CLI Usage

Install-Time Behavior

Config Format

Architecture

Workflow Stages

1.
`profile_update`

2.
`retrieval`

3.
`review`

Digest Enrichment

Delivery Routing

Stage 6:
`feedback_sync`

Hard Rules

Key Runtime Files

Reference Documents

Packaging Boundary

research-assist

Research Assist Skill

CLI Usage

Install-Time Behavior

Config Format

Architecture

Workflow Stages

1. profile_update

2. retrieval

3. review

Digest Enrichment

Delivery Routing

Stage 6: feedback_sync

Hard Rules

Key Runtime Files

Reference Documents

Packaging Boundary

1.
`profile_update`

2.
`retrieval`

3.
`review`

Stage 6:
`feedback_sync`