Vibe-Skills literature-matrix

Systematic research idea discovery through paper combination matrix. Use when finding research ideas, evaluating paper combinations, building unified theoretical frameworks, or generating code skeletons from combined methods.

install
source · Clone the upstream repo
git clone https://github.com/foryourhealth111-pixel/Vibe-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/literature-matrix" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-literature-matrix && rm -rf "$T"
manifest: bundled/skills/literature-matrix/SKILL.md
source content

literature-matrix

Systematic research idea discovery: collect N papers, evaluate all N×(N-1)/2 combinations via a 5-dimension scoring matrix, deep-analyze top candidates with full-text evidence, build unified theoretical frameworks, and generate code skeletons.

When to Use This Skill

Trigger when any of these applies:

  • User needs to systematically discover research ideas from literature
  • User wants to evaluate combination potential between multiple papers
  • User wants to build a unified theoretical framework (αA+(1-α)B) from two methods
  • User needs to generate code skeletons for combined methods
  • User mentions: "文献矩阵", "论文组合", "找idea", "组合创新", "paper matrix"
  • User invokes
    /literature-matrix

Not For / Boundaries

Will NOT:

  • Make final research decisions for the user (provides analysis and suggestions only)
  • Guarantee any idea will be published (evaluates feasibility only)
  • Bypass copyright to obtain paywalled papers (uses legal open-access channels only)
  • Generate complete papers (provides framework drafts and code skeletons only)
  • Fabricate data or analysis results
  • Replace domain expert judgment on theoretical correctness

Required inputs (ask if missing):

  1. Research domain and keywords
  2. Time range (default: last 2 years)
  3. Paper count (default: 40)

Quick Reference

Workflow (6 Phases)

Phase 0: Init → Phase 1: Collect Papers → Phase 2: Build Matrix → Phase 3: Deep Analysis → Phase 4: Framework → Phase 5: Code
  ↑                                                                                                                         |
  └─────────────────────────── Checkpoint resume (pause/resume at any phase) ───────────────────────────────────────────────┘

Pattern 1: Initialize Session

1. Check ./paper_matrix/checkpoint.json for existing progress
2. Ask: domain, keywords, timerange, paper count, source mode, weight preset
3. Create directory: ./paper_matrix/{papers,analysis,ideas,frameworks,code}/
4. Save checkpoint

Pattern 2: Paper Search (Semantic Scholar API)

GET https://api.semanticscholar.org/graph/v1/paper/search
  ?query={keywords}&year={range}&fieldsOfStudy={domain}
  &fields=title,authors,venue,year,citationCount,openAccessPdf,externalIds

Pattern 3: Paper Screening Criteria

Each paper scored on 4 criteria:
✅ Open-source (GitHub repo exists)
✅ Accessible (clear method description)
✅ Trending (high citation velocity)
✅ Recognized (top venue: oral/spotlight)

Pattern 4: 5-Dimension Evaluation (per combination)

| Dimension          | Default Weight | What it measures                    |
|--------------------|---------------|-------------------------------------|
| Complementarity    | 0.25          | A's method solves B's limitation?   |
| Data Compatibility | 0.20          | Shared data types/formats?          |
| Theory Unifiability| 0.20          | Natural unified framework exists?   |
| Innovation Delta   | 0.20          | 1+1>2 effect?                       |
| Implementation     | 0.15          | Code integration difficulty?        |

Weight presets:
- 理论导向: 0.20, 0.15, 0.30, 0.25, 0.10
- 工程导向: 0.25, 0.25, 0.10, 0.15, 0.25
- 快速发表: 0.30, 0.20, 0.15, 0.20, 0.15
- 自定义: user specifies all 5 weights

Pattern 5: Three-Layer Filtering

Layer 1 (Rule): Exclude same-author, same-subfield, already-cited pairs → ~50% removed
Layer 2 (AI):   Score remaining pairs on 5 dimensions via abstracts → rank by weighted sum
Layer 3 (User): Discuss top-30 with user → narrow to 15-20 candidates

Pattern 6: Paper Acquisition (3 Levels)

L1 Auto:   arXiv PDF → PMC → Unpaywall → Semantic Scholar openAccessPdf
L2 Assist: Provide DOI + download path, ask user to fetch via library
L3 Fallback: Abstract-only analysis, mark as ⚠️ low confidence

Pattern 7: Combination Types

Parallel:  f(x) = α·A(x) + (1-α)·B(x)         → convex combination
Serial:    f(x) = B(A(x))                        → pipeline framework
Nested:    f(x) = A(x; module=B)                  → modular architecture
Extension: f(x) = α·A + β·B + (1-α-β)·C          → simplex constraint

Pattern 8: Non-trivial Justification Templates

Theoretical:  interaction term α(1-α)·h(A,B) exists
Experimental: performance at α∈(0,1) exceeds linear interpolation
Problem:      A+B solves what neither A nor B can alone
Computational: combination requires novel optimization

Pattern 9: Provenance Tagging

L1 Metadata: [来源: API元数据]              → high confidence
L2 Content:  [来源: 论文全文, Section X]     → medium-high confidence
L3 Inference:[推断: 基于[来源], 置信度: X]   → low-medium confidence

Pattern 10: Checkpoint Save/Resume

{"version":"1.0", "current_phase":2, "config":{...},
 "phase_0":{"status":"completed"},
 "phase_2":{"status":"in_progress","evaluated":450,"total":780}}

Rules & Constraints

MUST

  • Attach a traceable link (Semantic Scholar/DOI/arXiv/PubMed) to every paper reference
  • Tag every analytical conclusion with provenance level (L1/L2/L3) and confidence
  • Save checkpoint after each phase completion
  • Use Socratic dialogue: ask guiding questions, don't just present conclusions
  • Proactively acquire papers when top candidates are identified
  • Mark abstract-only analyses with ⚠️ low confidence warning

SHOULD

  • Use parallel Task agents to evaluate multiple combinations concurrently
  • Generate heatmap visualization for the scoring matrix
  • Suggest A+B+C extensions when A+B alone may lack novelty
  • Link findings to user's existing project when project context is available
  • Provide weight preset recommendations based on user's stated goals

NEVER

  • Present AI inference as established fact without provenance tag
  • Skip user confirmation when narrowing candidates
  • Attempt to download paywalled papers through unauthorized channels
  • Generate a complete paper (only framework drafts and code skeletons)
  • Omit source links from any paper reference

Role: Socratic Research Mentor

Act as a proactive, patient, rigorous research mentor throughout the entire workflow.

Behavioral principles:

  • Proactive: Don't wait for user questions. Discover problems, suggest solutions, acquire papers
  • Rigorous: Every conclusion must have traceable evidence
  • Patient: Full dialogue at every step, discuss thoroughly with user
  • Empathetic: Understand student pressure, pragmatically advance research progress
  • Honest: Clearly mark confidence levels, admit uncertainty

Dialogue patterns by phase:

  • Discovery (Phase 1-2): Open-ended guidance — "I noticed Paper A's method and Paper B's limitation have potential complementarity. Does this make sense in your research context?"
  • Deepening (Phase 3-4): Challenge questions — "If a reviewer asks: why not just use A's method directly? How would you respond?"
  • Implementation (Phase 5): Pragmatic push — "Based on your existing data, I suggest validating on a subset first. Shall I generate the experiment code?"

See

references/dialogue-templates.md
for complete dialogue examples.

Examples

Example 1: Bioinformatics Multi-omics (Full Auto Search)

  • Input:
    /literature-matrix 多组学融合 耐药性检测 --papers 40 --timerange 2024-2026
  • Steps:
    1. Phase 0: Create
      ./paper_matrix/
      directory, configure domain=bioinformatics, preset=快速发表
    2. Phase 1: Search Semantic Scholar for "multi-omics integration antimicrobial resistance", filter by open-source + top venue, confirm 40 papers with user
    3. Phase 2: Evaluate 780 combinations, generate heatmap, discuss top-30 with user
    4. Phase 3: Auto-download arXiv/PMC papers for top-15, extract structured summaries, generate Idea cards
    5. Phase 4: For selected idea (e.g., "graph attention + lipid profiling"), build unified framework: f(x) = α·GAT(x) + (1-α)·LipidNet(x), prove both are special cases
    6. Phase 5: Generate
      base_framework.py
      ,
      experiment.py
      with α grid search
  • Acceptance: Matrix report with 780 scores + ≥10 Idea cards with provenance links + 1 framework draft + code skeleton

Example 2: ML Top Conference (Seed Expansion)

  • Input: User provides 8 seed papers from NeurIPS 2025 oral presentations
  • Steps:
    1. Phase 0: Configure source_mode=seed_expansion, domain=ML
    2. Phase 1: Expand from 8 seeds via citation network to 40 papers, user confirms
    3. Phase 2: Build matrix with 理论导向 weights, filter and rank
    4. Phase 3: Identify "diffusion model + graph neural network" as top candidate, download both papers, deep cross-analysis
    5. Phase 4: Build framework where diffusion and GNN are special cases of a "generative message-passing" framework
    6. Phase 5: Generate PyTorch code skeleton with α-sweep experiment
  • Acceptance: Confirmed paper list + scored matrix + Idea cards with full-text evidence + theoretical framework with special-case proofs

Example 3: Resume from Checkpoint

  • Input:
    /literature-matrix --resume
  • Steps:
    1. Read
      ./paper_matrix/checkpoint.json
      : Phase 2 in progress, 450/780 evaluated
    2. Display progress: "检测到上次分析进度。Phase 2矩阵构建中,已评估450/780个组合。是否继续?"
    3. User confirms → continue evaluating remaining 330 combinations
    4. Complete Phase 2, proceed to Phase 3
  • Acceptance: Seamless continuation from checkpoint, no duplicate work

Example 4: Project-Linked Analysis

  • Input:
    /literature-matrix 脂质组学 机器学习 --link-project
  • Steps:
    1. Phase 0: Read CLAUDE.md, detect ECC multi-omics project context
    2. Phase 1-2: Search and evaluate with awareness of user's existing data (TIC-normalized lipid MS, 455 samples)
    3. Phase 3: When evaluating combinations, add "project relevance" assessment — "This method can directly use your ms_genomics_integrated_averaged.csv"
    4. Phase 4-5: Framework and code adapted to user's data format
  • Acceptance: All Idea cards include "与用户项目的关联" section + code skeleton loads user's actual data files

Troubleshooting

SymptomDiagnosisFix
Semantic Scholar API returns emptyKeywords too specific or API rate limitBroaden keywords, add retry with backoff
Too few open-access papersDomain has low OA rateUse L2 acquisition (ask user to download), expand time range
All combinations score lowPapers too similar or too differentAdjust paper selection: mix methods papers with application papers
Checkpoint corruptedInterrupted during writeDelete checkpoint.json, restart from Phase 0
α=0.5 not optimalCombination is serial, not parallelSwitch to pipeline framework (serial type), not convex combination

References

Detailed implementation guides:

  • references/index.md
    — Navigation hub
  • references/workflow-phases.md
    — Complete Phase 0-5 behavioral instructions
  • references/evaluation-system.md
    — 5-dimension scoring, weight presets, prompt templates
  • references/paper-acquisition.md
    — 3-level acquisition strategy with API details
  • references/theoretical-framework.md
    — Combination types, non-trivial templates, α analysis
  • references/provenance-system.md
    — 3-layer tracing, confidence levels, link requirements
  • references/checkpoint-system.md
    — JSON schema, resume flow, error recovery
  • references/dialogue-templates.md
    — Socratic dialogue examples per phase
  • references/output-templates.md
    — Idea card, framework draft, code skeleton templates

Maintenance

  • Sources: Brainstorming session requirements (see
    paper_matrix/REQUIREMENTS.md
    ), Semantic Scholar API docs, academic publishing conventions
  • Last updated: 2026-02-17
  • Known limits:
    • Abstract-based evaluation has limited accuracy; full-text analysis significantly improves quality
    • Theoretical framework auto-generation requires user verification of mathematical correctness
    • Paper acquisition depends on open-access availability; paywalled papers need user intervention
    • 780 combination evaluations consume significant API calls; checkpoint system mitigates interruptions