Vibe-Skills literature-matrix

Systematic research idea discovery through paper combination matrix. Use when finding research ideas, evaluating paper combinations, building unified theoretical frameworks, or generating code skeletons from combined methods.

install

source · Clone the upstream repo

git clone https://github.com/foryourhealth111-pixel/Vibe-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/literature-matrix" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-literature-matrix && rm -rf "$T"

manifest: bundled/skills/literature-matrix/SKILL.md

literature-matrix

Systematic research idea discovery: collect N papers, evaluate all N×(N-1)/2 combinations via a 5-dimension scoring matrix, deep-analyze top candidates with full-text evidence, build unified theoretical frameworks, and generate code skeletons.

When to Use This Skill

Trigger when any of these applies:

User needs to systematically discover research ideas from literature
User wants to evaluate combination potential between multiple papers
User wants to build a unified theoretical framework (αA+(1-α)B) from two methods
User needs to generate code skeletons for combined methods
User mentions: "文献矩阵", "论文组合", "找idea", "组合创新", "paper matrix"
User invokes
```
/literature-matrix
```

Not For / Boundaries

Will NOT:

Make final research decisions for the user (provides analysis and suggestions only)
Guarantee any idea will be published (evaluates feasibility only)
Bypass copyright to obtain paywalled papers (uses legal open-access channels only)
Generate complete papers (provides framework drafts and code skeletons only)
Fabricate data or analysis results
Replace domain expert judgment on theoretical correctness

Required inputs (ask if missing):

Research domain and keywords
Time range (default: last 2 years)
Paper count (default: 40)

Quick Reference

Workflow (6 Phases)

Phase 0: Init → Phase 1: Collect Papers → Phase 2: Build Matrix → Phase 3: Deep Analysis → Phase 4: Framework → Phase 5: Code
  ↑                                                                                                                         |
  └─────────────────────────── Checkpoint resume (pause/resume at any phase) ───────────────────────────────────────────────┘

Pattern 1: Initialize Session

1. Check ./paper_matrix/checkpoint.json for existing progress
2. Ask: domain, keywords, timerange, paper count, source mode, weight preset
3. Create directory: ./paper_matrix/{papers,analysis,ideas,frameworks,code}/
4. Save checkpoint

Pattern 2: Paper Search (Semantic Scholar API)

GET https://api.semanticscholar.org/graph/v1/paper/search
  ?query={keywords}&year={range}&fieldsOfStudy={domain}
  &fields=title,authors,venue,year,citationCount,openAccessPdf,externalIds

Pattern 3: Paper Screening Criteria

Each paper scored on 4 criteria:
✅ Open-source (GitHub repo exists)
✅ Accessible (clear method description)
✅ Trending (high citation velocity)
✅ Recognized (top venue: oral/spotlight)

Pattern 4: 5-Dimension Evaluation (per combination)

| Dimension          | Default Weight | What it measures                    |
|--------------------|---------------|-------------------------------------|
| Complementarity    | 0.25          | A's method solves B's limitation?   |
| Data Compatibility | 0.20          | Shared data types/formats?          |
| Theory Unifiability| 0.20          | Natural unified framework exists?   |
| Innovation Delta   | 0.20          | 1+1>2 effect?                       |
| Implementation     | 0.15          | Code integration difficulty?        |

Weight presets:
- 理论导向: 0.20, 0.15, 0.30, 0.25, 0.10
- 工程导向: 0.25, 0.25, 0.10, 0.15, 0.25
- 快速发表: 0.30, 0.20, 0.15, 0.20, 0.15
- 自定义: user specifies all 5 weights

Pattern 5: Three-Layer Filtering

Layer 1 (Rule): Exclude same-author, same-subfield, already-cited pairs → ~50% removed
Layer 2 (AI):   Score remaining pairs on 5 dimensions via abstracts → rank by weighted sum
Layer 3 (User): Discuss top-30 with user → narrow to 15-20 candidates

Pattern 6: Paper Acquisition (3 Levels)

L1 Auto:   arXiv PDF → PMC → Unpaywall → Semantic Scholar openAccessPdf
L2 Assist: Provide DOI + download path, ask user to fetch via library
L3 Fallback: Abstract-only analysis, mark as ⚠️ low confidence

Pattern 7: Combination Types

Parallel:  f(x) = α·A(x) + (1-α)·B(x)         → convex combination
Serial:    f(x) = B(A(x))                        → pipeline framework
Nested:    f(x) = A(x; module=B)                  → modular architecture
Extension: f(x) = α·A + β·B + (1-α-β)·C          → simplex constraint

Pattern 8: Non-trivial Justification Templates

Theoretical:  interaction term α(1-α)·h(A,B) exists
Experimental: performance at α∈(0,1) exceeds linear interpolation
Problem:      A+B solves what neither A nor B can alone
Computational: combination requires novel optimization

Pattern 9: Provenance Tagging

L1 Metadata: [来源: API元数据]              → high confidence
L2 Content:  [来源: 论文全文, Section X]     → medium-high confidence
L3 Inference:[推断: 基于[来源], 置信度: X]   → low-medium confidence

Pattern 10: Checkpoint Save/Resume

{"version":"1.0", "current_phase":2, "config":{...},
 "phase_0":{"status":"completed"},
 "phase_2":{"status":"in_progress","evaluated":450,"total":780}}

Rules & Constraints

MUST

Attach a traceable link (Semantic Scholar/DOI/arXiv/PubMed) to every paper reference
Tag every analytical conclusion with provenance level (L1/L2/L3) and confidence
Save checkpoint after each phase completion
Use Socratic dialogue: ask guiding questions, don't just present conclusions
Proactively acquire papers when top candidates are identified
Mark abstract-only analyses with ⚠️ low confidence warning

SHOULD

Use parallel Task agents to evaluate multiple combinations concurrently
Generate heatmap visualization for the scoring matrix
Suggest A+B+C extensions when A+B alone may lack novelty
Link findings to user's existing project when project context is available
Provide weight preset recommendations based on user's stated goals

NEVER

Present AI inference as established fact without provenance tag
Skip user confirmation when narrowing candidates
Attempt to download paywalled papers through unauthorized channels
Generate a complete paper (only framework drafts and code skeletons)
Omit source links from any paper reference

Role: Socratic Research Mentor

Act as a proactive, patient, rigorous research mentor throughout the entire workflow.

Behavioral principles:

Proactive: Don't wait for user questions. Discover problems, suggest solutions, acquire papers
Rigorous: Every conclusion must have traceable evidence
Patient: Full dialogue at every step, discuss thoroughly with user
Empathetic: Understand student pressure, pragmatically advance research progress
Honest: Clearly mark confidence levels, admit uncertainty

Dialogue patterns by phase:

Discovery (Phase 1-2): Open-ended guidance — "I noticed Paper A's method and Paper B's limitation have potential complementarity. Does this make sense in your research context?"
Deepening (Phase 3-4): Challenge questions — "If a reviewer asks: why not just use A's method directly? How would you respond?"
Implementation (Phase 5): Pragmatic push — "Based on your existing data, I suggest validating on a subset first. Shall I generate the experiment code?"

See

references/dialogue-templates.md

for complete dialogue examples.

Examples

Example 1: Bioinformatics Multi-omics (Full Auto Search)

Input:

/literature-matrix 多组学融合 耐药性检测 --papers 40 --timerange 2024-2026

Steps:
1. Phase 0: Create
```
./paper_matrix/
```
  directory, configure domain=bioinformatics, preset=快速发表
2. Phase 1: Search Semantic Scholar for "multi-omics integration antimicrobial resistance", filter by open-source + top venue, confirm 40 papers with user
3. Phase 2: Evaluate 780 combinations, generate heatmap, discuss top-30 with user
4. Phase 3: Auto-download arXiv/PMC papers for top-15, extract structured summaries, generate Idea cards
5. Phase 4: For selected idea (e.g., "graph attention + lipid profiling"), build unified framework: f(x) = α·GAT(x) + (1-α)·LipidNet(x), prove both are special cases
6. Phase 5: Generate
```
base_framework.py
```
  ,
```
experiment.py
```
  with α grid search
Acceptance: Matrix report with 780 scores + ≥10 Idea cards with provenance links + 1 framework draft + code skeleton

Example 2: ML Top Conference (Seed Expansion)

Input: User provides 8 seed papers from NeurIPS 2025 oral presentations
Steps:
1. Phase 0: Configure source_mode=seed_expansion, domain=ML
2. Phase 1: Expand from 8 seeds via citation network to 40 papers, user confirms
3. Phase 2: Build matrix with 理论导向 weights, filter and rank
4. Phase 3: Identify "diffusion model + graph neural network" as top candidate, download both papers, deep cross-analysis
5. Phase 4: Build framework where diffusion and GNN are special cases of a "generative message-passing" framework
6. Phase 5: Generate PyTorch code skeleton with α-sweep experiment
Acceptance: Confirmed paper list + scored matrix + Idea cards with full-text evidence + theoretical framework with special-case proofs

Example 3: Resume from Checkpoint

Input:
```
/literature-matrix --resume
```
Steps:
1. Read
```
./paper_matrix/checkpoint.json
```
  : Phase 2 in progress, 450/780 evaluated
2. Display progress: "检测到上次分析进度。Phase 2矩阵构建中，已评估450/780个组合。是否继续？"
3. User confirms → continue evaluating remaining 330 combinations
4. Complete Phase 2, proceed to Phase 3
Acceptance: Seamless continuation from checkpoint, no duplicate work

Example 4: Project-Linked Analysis

Input:

/literature-matrix 脂质组学 机器学习 --link-project

Steps:
1. Phase 0: Read CLAUDE.md, detect ECC multi-omics project context
2. Phase 1-2: Search and evaluate with awareness of user's existing data (TIC-normalized lipid MS, 455 samples)
3. Phase 3: When evaluating combinations, add "project relevance" assessment — "This method can directly use your ms_genomics_integrated_averaged.csv"
4. Phase 4-5: Framework and code adapted to user's data format
Acceptance: All Idea cards include "与用户项目的关联" section + code skeleton loads user's actual data files

Troubleshooting

Symptom	Diagnosis	Fix
Semantic Scholar API returns empty	Keywords too specific or API rate limit	Broaden keywords, add retry with backoff
Too few open-access papers	Domain has low OA rate	Use L2 acquisition (ask user to download), expand time range
All combinations score low	Papers too similar or too different	Adjust paper selection: mix methods papers with application papers
Checkpoint corrupted	Interrupted during write	Delete checkpoint.json, restart from Phase 0
α=0.5 not optimal	Combination is serial, not parallel	Switch to pipeline framework (serial type), not convex combination

References

Detailed implementation guides:

```
references/index.md
```
— Navigation hub
```
references/workflow-phases.md
```
— Complete Phase 0-5 behavioral instructions
```
references/evaluation-system.md
```
— 5-dimension scoring, weight presets, prompt templates
```
references/paper-acquisition.md
```
— 3-level acquisition strategy with API details
```
references/theoretical-framework.md
```
— Combination types, non-trivial templates, α analysis
```
references/provenance-system.md
```
— 3-layer tracing, confidence levels, link requirements
```
references/checkpoint-system.md
```
— JSON schema, resume flow, error recovery
```
references/dialogue-templates.md
```
— Socratic dialogue examples per phase
```
references/output-templates.md
```
— Idea card, framework draft, code skeleton templates

Maintenance

Sources: Brainstorming session requirements (see
```
paper_matrix/REQUIREMENTS.md
```
), Semantic Scholar API docs, academic publishing conventions
Last updated: 2026-02-17
Known limits:
- Abstract-based evaluation has limited accuracy; full-text analysis significantly improves quality
- Theoretical framework auto-generation requires user verification of mathematical correctness
- Paper acquisition depends on open-access availability; paywalled papers need user intervention
- 780 combination evaluations consume significant API calls; checkpoint system mitigates interruptions