Claude-skill-registry academic-benchmark-researcher
When the user requests information about academic benchmarks, datasets, or research papers, particularly in machine learning, deep learning, or logical reasoning domains. This skill enables systematic research of academic benchmarks by searching web sources, downloading and analyzing arXiv papers, extracting key metadata (number of tasks, training availability, difficulty levels), and compiling comparative summaries. It triggers on requests involving dataset comparisons, benchmark analysis, or academic paper research for table creation.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/academic-benchmark-researcher" ~/.claude/skills/majiayu000-claude-skill-registry-academic-benchmark-researcher && rm -rf "$T"
skills/data/academic-benchmark-researcher/SKILL.mdInstructions
Primary Objective
Systematically research academic benchmarks, datasets, or research papers to extract and compile comparative information (e.g., into a summary table). The core workflow involves: 1) Identifying relevant sources, 2) Extracting key metadata, 3) Synthesizing findings into a structured output (like a LaTeX table).
Core Workflow
- Clarify & Parse Request: Identify the specific benchmarks/datasets/papers mentioned by the user. Note any required output format (e.g., LaTeX table with specific columns) and constraints (e.g., "no commented lines").
- Initial Information Gathering: For each identified entity (dataset/paper):
- Use
to find general information, official pages (GitHub, project sites), and relevant arXiv IDs.local-web_search - For arXiv papers, use
orarxiv_local-download_paper
to obtain the paper content.fetch-fetch_markdown - Search for specific attributes requested by the user (e.g., "number of tasks," "training set," "difficulty levels").
- Use
- Deep Dive & Verification: Read paper abstracts, introductions, and methodology sections (using
or parsed markdown) to confirm key details. Cross-reference information from multiple sources (official site, paper, blog posts) for accuracy.arxiv_local-read_paper - Information Synthesis: Compile the extracted metadata into a structured format aligned with the user's request. Resolve any ambiguities (e.g., if a "task" count refers to broad categories or individual instances) based on the most authoritative source (typically the original paper).
- Output Generation: Create the final deliverable (e.g., a
file). Ensure it strictly adheres to the user's formatting specifications. Optionally, provide a concise textual summary of the findings..tex
Key Metadata to Extract
When researching a benchmark/dataset, prioritize finding:
- Full Name & Acronym
- Number of Tasks/Categories: Distinguish between broad task categories and individual task instances.
- Training Data Availability: Does it include a dedicated training set, or is it for evaluation only?
- Difficulty Levels: Does it feature adjustable or tiered difficulty levels?
- Core Purpose/Description
- Primary Source (arXiv ID, GitHub repo)
Tool Usage Guidelines
: Use for initial discovery and finding high-level descriptions. Employ specific queries combining the dataset name and target attributes (e.g., "BBH training set few-shot examples").local-web_search
/arxiv_local-download_paper
: Use to access the canonical source for detailed information. Preferfetch-fetch_markdown
for full text analysis when needed.arxiv_local-download_paper
/filesystem-write_file
: Use for creating and verifying final output files in the workspace.filesystem-read_file
: Use only after successfully delivering the requested output and providing a final summary.local-claim_done
Output Standards
- LaTeX Tables: Ensure the output contains only the specified table content, without extra comments, document headers, or unrelated text.
- Summaries: Be concise but complete, highlighting the sourced information for each dataset.
- Accuracy: Base conclusions on the original paper or official project documentation where possible. Acknowledge if information is not explicitly stated.
Common Pitfalls & Resolutions
- Ambiguous Task Counts: If a paper mentions "5 task categories" (like KOR-Bench), report that as the task count unless the user specifies otherwise. Clarify in the summary if needed.
- Missing Information: If a key attribute (e.g., training set) is not mentioned in primary sources, infer based on benchmark type (e.g., many evaluation benchmarks lack training sets) and denote with
. State the assumption in your summary.\ding{55} - arXiv Paper Processing: If
returns a "converting" status, usearxiv_local-download_paper
on the arXiv abstract page as a reliable fallback to get the paper's metadata and abstract.fetch-fetch_markdown