Vibe-Skills literature-matrix
Systematic research idea discovery through paper combination matrix. Use when finding research ideas, evaluating paper combinations, building unified theoretical frameworks, or generating code skeletons from combined methods.
install
source · Clone the upstream repo
git clone https://github.com/foryourhealth111-pixel/Vibe-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/literature-matrix" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-literature-matrix && rm -rf "$T"
manifest:
bundled/skills/literature-matrix/SKILL.mdsource content
literature-matrix
Systematic research idea discovery: collect N papers, evaluate all N×(N-1)/2 combinations via a 5-dimension scoring matrix, deep-analyze top candidates with full-text evidence, build unified theoretical frameworks, and generate code skeletons.
When to Use This Skill
Trigger when any of these applies:
- User needs to systematically discover research ideas from literature
- User wants to evaluate combination potential between multiple papers
- User wants to build a unified theoretical framework (αA+(1-α)B) from two methods
- User needs to generate code skeletons for combined methods
- User mentions: "文献矩阵", "论文组合", "找idea", "组合创新", "paper matrix"
- User invokes
/literature-matrix
Not For / Boundaries
Will NOT:
- Make final research decisions for the user (provides analysis and suggestions only)
- Guarantee any idea will be published (evaluates feasibility only)
- Bypass copyright to obtain paywalled papers (uses legal open-access channels only)
- Generate complete papers (provides framework drafts and code skeletons only)
- Fabricate data or analysis results
- Replace domain expert judgment on theoretical correctness
Required inputs (ask if missing):
- Research domain and keywords
- Time range (default: last 2 years)
- Paper count (default: 40)
Quick Reference
Workflow (6 Phases)
Phase 0: Init → Phase 1: Collect Papers → Phase 2: Build Matrix → Phase 3: Deep Analysis → Phase 4: Framework → Phase 5: Code ↑ | └─────────────────────────── Checkpoint resume (pause/resume at any phase) ───────────────────────────────────────────────┘
Pattern 1: Initialize Session
1. Check ./paper_matrix/checkpoint.json for existing progress 2. Ask: domain, keywords, timerange, paper count, source mode, weight preset 3. Create directory: ./paper_matrix/{papers,analysis,ideas,frameworks,code}/ 4. Save checkpoint
Pattern 2: Paper Search (Semantic Scholar API)
GET https://api.semanticscholar.org/graph/v1/paper/search ?query={keywords}&year={range}&fieldsOfStudy={domain} &fields=title,authors,venue,year,citationCount,openAccessPdf,externalIds
Pattern 3: Paper Screening Criteria
Each paper scored on 4 criteria: ✅ Open-source (GitHub repo exists) ✅ Accessible (clear method description) ✅ Trending (high citation velocity) ✅ Recognized (top venue: oral/spotlight)
Pattern 4: 5-Dimension Evaluation (per combination)
| Dimension | Default Weight | What it measures | |--------------------|---------------|-------------------------------------| | Complementarity | 0.25 | A's method solves B's limitation? | | Data Compatibility | 0.20 | Shared data types/formats? | | Theory Unifiability| 0.20 | Natural unified framework exists? | | Innovation Delta | 0.20 | 1+1>2 effect? | | Implementation | 0.15 | Code integration difficulty? | Weight presets: - 理论导向: 0.20, 0.15, 0.30, 0.25, 0.10 - 工程导向: 0.25, 0.25, 0.10, 0.15, 0.25 - 快速发表: 0.30, 0.20, 0.15, 0.20, 0.15 - 自定义: user specifies all 5 weights
Pattern 5: Three-Layer Filtering
Layer 1 (Rule): Exclude same-author, same-subfield, already-cited pairs → ~50% removed Layer 2 (AI): Score remaining pairs on 5 dimensions via abstracts → rank by weighted sum Layer 3 (User): Discuss top-30 with user → narrow to 15-20 candidates
Pattern 6: Paper Acquisition (3 Levels)
L1 Auto: arXiv PDF → PMC → Unpaywall → Semantic Scholar openAccessPdf L2 Assist: Provide DOI + download path, ask user to fetch via library L3 Fallback: Abstract-only analysis, mark as ⚠️ low confidence
Pattern 7: Combination Types
Parallel: f(x) = α·A(x) + (1-α)·B(x) → convex combination Serial: f(x) = B(A(x)) → pipeline framework Nested: f(x) = A(x; module=B) → modular architecture Extension: f(x) = α·A + β·B + (1-α-β)·C → simplex constraint
Pattern 8: Non-trivial Justification Templates
Theoretical: interaction term α(1-α)·h(A,B) exists Experimental: performance at α∈(0,1) exceeds linear interpolation Problem: A+B solves what neither A nor B can alone Computational: combination requires novel optimization
Pattern 9: Provenance Tagging
L1 Metadata: [来源: API元数据] → high confidence L2 Content: [来源: 论文全文, Section X] → medium-high confidence L3 Inference:[推断: 基于[来源], 置信度: X] → low-medium confidence
Pattern 10: Checkpoint Save/Resume
{"version":"1.0", "current_phase":2, "config":{...}, "phase_0":{"status":"completed"}, "phase_2":{"status":"in_progress","evaluated":450,"total":780}}
Rules & Constraints
MUST
- Attach a traceable link (Semantic Scholar/DOI/arXiv/PubMed) to every paper reference
- Tag every analytical conclusion with provenance level (L1/L2/L3) and confidence
- Save checkpoint after each phase completion
- Use Socratic dialogue: ask guiding questions, don't just present conclusions
- Proactively acquire papers when top candidates are identified
- Mark abstract-only analyses with ⚠️ low confidence warning
SHOULD
- Use parallel Task agents to evaluate multiple combinations concurrently
- Generate heatmap visualization for the scoring matrix
- Suggest A+B+C extensions when A+B alone may lack novelty
- Link findings to user's existing project when project context is available
- Provide weight preset recommendations based on user's stated goals
NEVER
- Present AI inference as established fact without provenance tag
- Skip user confirmation when narrowing candidates
- Attempt to download paywalled papers through unauthorized channels
- Generate a complete paper (only framework drafts and code skeletons)
- Omit source links from any paper reference
Role: Socratic Research Mentor
Act as a proactive, patient, rigorous research mentor throughout the entire workflow.
Behavioral principles:
- Proactive: Don't wait for user questions. Discover problems, suggest solutions, acquire papers
- Rigorous: Every conclusion must have traceable evidence
- Patient: Full dialogue at every step, discuss thoroughly with user
- Empathetic: Understand student pressure, pragmatically advance research progress
- Honest: Clearly mark confidence levels, admit uncertainty
Dialogue patterns by phase:
- Discovery (Phase 1-2): Open-ended guidance — "I noticed Paper A's method and Paper B's limitation have potential complementarity. Does this make sense in your research context?"
- Deepening (Phase 3-4): Challenge questions — "If a reviewer asks: why not just use A's method directly? How would you respond?"
- Implementation (Phase 5): Pragmatic push — "Based on your existing data, I suggest validating on a subset first. Shall I generate the experiment code?"
See
references/dialogue-templates.md for complete dialogue examples.
Examples
Example 1: Bioinformatics Multi-omics (Full Auto Search)
- Input:
/literature-matrix 多组学融合 耐药性检测 --papers 40 --timerange 2024-2026 - Steps:
- Phase 0: Create
directory, configure domain=bioinformatics, preset=快速发表./paper_matrix/ - Phase 1: Search Semantic Scholar for "multi-omics integration antimicrobial resistance", filter by open-source + top venue, confirm 40 papers with user
- Phase 2: Evaluate 780 combinations, generate heatmap, discuss top-30 with user
- Phase 3: Auto-download arXiv/PMC papers for top-15, extract structured summaries, generate Idea cards
- Phase 4: For selected idea (e.g., "graph attention + lipid profiling"), build unified framework: f(x) = α·GAT(x) + (1-α)·LipidNet(x), prove both are special cases
- Phase 5: Generate
,base_framework.py
with α grid searchexperiment.py
- Phase 0: Create
- Acceptance: Matrix report with 780 scores + ≥10 Idea cards with provenance links + 1 framework draft + code skeleton
Example 2: ML Top Conference (Seed Expansion)
- Input: User provides 8 seed papers from NeurIPS 2025 oral presentations
- Steps:
- Phase 0: Configure source_mode=seed_expansion, domain=ML
- Phase 1: Expand from 8 seeds via citation network to 40 papers, user confirms
- Phase 2: Build matrix with 理论导向 weights, filter and rank
- Phase 3: Identify "diffusion model + graph neural network" as top candidate, download both papers, deep cross-analysis
- Phase 4: Build framework where diffusion and GNN are special cases of a "generative message-passing" framework
- Phase 5: Generate PyTorch code skeleton with α-sweep experiment
- Acceptance: Confirmed paper list + scored matrix + Idea cards with full-text evidence + theoretical framework with special-case proofs
Example 3: Resume from Checkpoint
- Input:
/literature-matrix --resume - Steps:
- Read
: Phase 2 in progress, 450/780 evaluated./paper_matrix/checkpoint.json - Display progress: "检测到上次分析进度。Phase 2矩阵构建中,已评估450/780个组合。是否继续?"
- User confirms → continue evaluating remaining 330 combinations
- Complete Phase 2, proceed to Phase 3
- Read
- Acceptance: Seamless continuation from checkpoint, no duplicate work
Example 4: Project-Linked Analysis
- Input:
/literature-matrix 脂质组学 机器学习 --link-project - Steps:
- Phase 0: Read CLAUDE.md, detect ECC multi-omics project context
- Phase 1-2: Search and evaluate with awareness of user's existing data (TIC-normalized lipid MS, 455 samples)
- Phase 3: When evaluating combinations, add "project relevance" assessment — "This method can directly use your ms_genomics_integrated_averaged.csv"
- Phase 4-5: Framework and code adapted to user's data format
- Acceptance: All Idea cards include "与用户项目的关联" section + code skeleton loads user's actual data files
Troubleshooting
| Symptom | Diagnosis | Fix |
|---|---|---|
| Semantic Scholar API returns empty | Keywords too specific or API rate limit | Broaden keywords, add retry with backoff |
| Too few open-access papers | Domain has low OA rate | Use L2 acquisition (ask user to download), expand time range |
| All combinations score low | Papers too similar or too different | Adjust paper selection: mix methods papers with application papers |
| Checkpoint corrupted | Interrupted during write | Delete checkpoint.json, restart from Phase 0 |
| α=0.5 not optimal | Combination is serial, not parallel | Switch to pipeline framework (serial type), not convex combination |
References
Detailed implementation guides:
— Navigation hubreferences/index.md
— Complete Phase 0-5 behavioral instructionsreferences/workflow-phases.md
— 5-dimension scoring, weight presets, prompt templatesreferences/evaluation-system.md
— 3-level acquisition strategy with API detailsreferences/paper-acquisition.md
— Combination types, non-trivial templates, α analysisreferences/theoretical-framework.md
— 3-layer tracing, confidence levels, link requirementsreferences/provenance-system.md
— JSON schema, resume flow, error recoveryreferences/checkpoint-system.md
— Socratic dialogue examples per phasereferences/dialogue-templates.md
— Idea card, framework draft, code skeleton templatesreferences/output-templates.md
Maintenance
- Sources: Brainstorming session requirements (see
), Semantic Scholar API docs, academic publishing conventionspaper_matrix/REQUIREMENTS.md - Last updated: 2026-02-17
- Known limits:
- Abstract-based evaluation has limited accuracy; full-text analysis significantly improves quality
- Theoretical framework auto-generation requires user verification of mathematical correctness
- Paper acquisition depends on open-access availability; paywalled papers need user intervention
- 780 combination evaluations consume significant API calls; checkpoint system mitigates interruptions