research-assist
A lightweight arXiv literature digest skill for OpenClaw, with Zotero-driven interest profiling, research-map plus Zotero-semantic ranking, agent-enriched digest cards, and non-destructive feedback writeback.
git clone https://github.com/zhanglg12/research-assist
git clone --depth=1 https://github.com/zhanglg12/research-assist ~/.claude/skills/zhanglg12-research-assist-research-assist
SKILL.mdResearch Assist Skill
An OpenClaw skill that turns Zotero evidence into a profile-driven literature digest, using arXiv by default and optionally expanding recall with OpenAlex and Semantic Scholar before deduplication.
CLI Usage
# Full digest: profile check → literature retrieval → rank → markdown output uv run --project ~/.openclaw/skills/research-assist \ research-assist --action digest --config ~/.openclaw/skills/research-assist/config.json # Ad-hoc literature search uv run --project ~/.openclaw/skills/research-assist \ research-assist --action search --query "gaussian process" --top 5 # Check profile refresh status uv run --project ~/.openclaw/skills/research-assist \ research-assist --action profile-refresh --config ~/.openclaw/skills/research-assist/config.json # Zotero MCP server (for profile evidence + feedback writeback) uv run --project ~/.openclaw/skills/research-assist research-assist-zotero-mcp
Or via Python module:
uv run --project ~/.openclaw/skills/research-assist \ python -m codex_research_assist --action digest --config ~/.openclaw/skills/research-assist/config.json
Default config path:
~/.openclaw/skills/research-assist/config.json
During install or reconfiguration, do not embed the full setup questionnaire in this file. Use
references/setup-routing.md as the install-time interaction guide, ask only the questions relevant to the user's goal, then edit config.json directly.
Install-Time Behavior
Installation and reconfiguration are one-time operations.
Hard rules for the host agent:
- use
only when the user is installing, reconfiguring, or when required config is missingreferences/setup-routing.md - once
is valid, normal digest/search/render/feedback runs must not reopen setup questionsconfig.json - do not restate dormant install options during regular literature work
- if the user asks for normal runtime work, prefer using the existing config over discussing installation
- when setup selects optional backends or delivery routes, execute the required install/setup commands instead of only listing them
- before leaving setup, run a minimal verification for the selected backend or route and report the result
- only fall back to manual instructions when a step is blocked by missing secrets, missing local services, missing permissions, or a platform limitation
Config Format
{ "profile_path": "~/.openclaw/skills/research-assist/profiles/research-interest.json", "output_root": "~/.openclaw/skills/research-assist/reports", "retrieval_defaults": { "max_results_per_interest": 20, "since_days": 7, "max_age_days": 7 } }
Architecture
config.json (OpenClaw skill config) ↓ openclaw_runner.py (CLI entry, markdown to stdout) ├── profile_refresh_policy → check if profile needs update ├── pipeline.py → multi-source literature retrieval (arXiv default) ├── ranker.py → two-signal scoring (map_match + zotero_semantic) └── format_*_markdown() → structured markdown output
No LLM calls inside the packaged Python pipeline. Retrieval, ranking, and formatting are pure data operations. Intelligence comes from the calling agent (OpenClaw / Claude Code / Codex CLI).
Profile refresh should be handled by the OpenClaw controller or agent layer, using live Zotero evidence via the bundled Zotero MCP.
Workflow Stages
1. profile_update
profile_update- read the current Zotero evidence base when refresh is required
- maintain
profiles/research-interest.json - preserve the compact contract:
,method_keywords
,query_aliasesexclude_keywords - keep method labels short and retrieval-friendly
- prefer
for discovery, thenzotero_semantic_search
for exact resolutionzotero_search_items - use
for live Zotero reads (no direct API calls)research-assist-zotero-mcp
OpenClaw generation rule:
- treat Zotero like a studio palette, not a flat folder dump
- use collection structure as the sketch of the research map
- use representative papers as the main evidence for what each region actually contains
- use semantic search as the blending layer that connects nearby themes across collections
- write interests that feel like stable method axes, not loose keyword bags
- aim for about 6 interests by default; usually stay in the 4-8 range unless the evidence strongly says otherwise
- if the draft has too few interests, split mixed regions by real method differences; if it has too many, merge nearby regions that share one stable method axis
- if collection names and paper content disagree, trust the papers more than the folder label
- if summary terms are frequent but too generic, use them only to refine wording, not to define the map
- the final profile should read like a compact map of the user's research territory: a few clear regions, each with short labels and retrieval-friendly aliases
2. retrieval
retrieval- query arXiv by default, optionally add OpenAlex and Semantic Scholar per interest
- expand the paper pool before ranking when multiple sources are enabled
- generate structured candidate JSON with full provenance
- deduplicate across interests and across enabled sources
3. review
review- rank candidates with two-signal scoring:
- map_match (0.30): how well the paper fits the current research-map slices
- zotero_semantic (0.70): how close the paper is to nearby Zotero literature
- apply the low-map guard:
- if
, apply the configured penalty to avoid semantic-only false positivesmap_match < 0.30
- if
- output ranked markdown to stdout for agent review
- prefer a smaller sharper set over a noisy dump
- stay
abstract-first
Digest Enrichment
- OpenClaw should treat agent-filled review as the default digest-enrichment path
- after retrieval, let the host agent enrich the top-ranked candidate JSON files with review patches
- use
to cap how many ranked candidates the host agent needs to inspectreview_generation.agent_top_n - let the host agent decide the final visible subset by setting
review.selected_for_digest - use
as the hard upper bound for the final rendered digestreview_generation.final_top_n - keep
enabled unless the user explicitly wants hard failure instead of fallback textfallback_to_system - after patches are applied, re-render the digest so HTML / Telegram outputs use the enriched review text
should sound like a recommendation, not a provenance reportwhy_it_matters
should capture real uncertainty or scope boundaries, not generic hedgingcaveats- the host agent should also fill
, including nearest-neighbor fallback when candidate-level evidence is missingreview.zotero_comparison - keep nearest-neighbor output compact: usually 1-2 items
- the host agent is not responsible for email / telegram wrapper copy, subjects, or routing
Delivery Routing
- use one shared delivery path and branch at the end with
delivery.primary_channel - default primary channel is
;email
is backup or alternate primarytelegram - channel wrappers are system-owned:
- email subject/body/profile card/stat cards
- telegram compact message shell
- do not ask the host agent to generate channel-specific wrappers
Stage 6: feedback_sync
feedback_syncAfter the digest is reviewed and delivered, the host agent may push non-destructive feedback back into Zotero.
Workflow:
- collect the user's explicit feedback on each digest candidate (keep, drop, archive, watch, etc.)
- encode feedback as
reports/schema/zotero-feedback.schema.json - call
through the bundled Zotero MCP withzotero_apply_feedback
firstdry_run=true - show the dry-run plan to the user and ask for confirmation before applying
- only after confirmation, re-run with
dry_run=false
Allowed feedback decisions:
— high-priority paper, tag and promote in libraryread_first
— worth scanning, tag for later reviewskim
— track this topic area, add to watchlist collectionwatch
— not relevant now, mark but do not removeskip_for_now
— reviewed and filed, move to archive collectionarchive
— add to a standing watchlist for periodic check-inwatchlist
— not relevant, tag to suppress in future runsignore
— no decision yet, skip writeback for this itemunset
What feedback can do (non-destructive only):
- add tags (including
decision tags)ra-status:* - add or change collection membership
- append notes to items
- create new collections if needed for organization
What feedback must never do:
- delete Zotero items or collections
- modify item metadata (title, authors, abstract, DOI)
- move or delete attachment files
- rewrite top-level taxonomy without explicit user instruction
- apply changes without showing the dry-run plan first
Matching behavior:
- match items by
(preferred),item_key
, ordoititle_contains - at least one match field must be provided per decision
- DOI matching is case-insensitive
uses substring match (not exact)title_contains- if no match is found, the decision is recorded as
in the plannot_found
Edge cases:
- duplicate tags are deduplicated (case-insensitive)
- previous
tags are replaced when a new decision is appliedra-status:* - the
system tag is always preservedresearch-assist
decisions produce no status tag and no writebackunset- empty
,add_tags
,remove_tags
,add_collections
are allowed (no-op for that field)remove_collections
Hard Rules
- do not expand concise method labels into long topic sentences
- do not make full text the default review mode
- do not delete Zotero items or collections automatically
- prefer
for any Zotero writebackdry_run=true - do not treat scheduler wiring as part of the skill
Key Runtime Files
- OpenClaw runner:
src/codex_research_assist/openclaw_runner.py - Ranker:
src/codex_research_assist/ranker.py - Pipeline:
src/codex_research_assist/arxiv_profile_pipeline/pipeline.py - Example config:
config.example.json - Example profile:
profiles/research-interest.example.json
Reference Documents
— stage order and controller boundaryreferences/workflow.md
— profile contract and review policyreferences/contracts.md
— packaging include/exclude rulesreferences/distribution.md
— install-time route selection and option questionsreferences/setup-routing.md
—references/review-generation.md
vssystem
review contractagent_fill
— how to turn Zotero evidence into a research-map-style profilereferences/profile-map-generation.md
Packaging Boundary
Include in distributable skill:
,SKILL.md
,config.example.json
,pyproject.tomluv.locksrc/references/profiles/research-interest.example.jsonautomation/arxiv-profile-digest.example.tomlautomation/prompts/reports/schema/- generated package-root
install.sh
Exclude:
- generated reports, temporary state
- local secret config
- scheduler wrappers
- repository planning documents (
,NEXT_PLAN.md
)CODEMAP.md