Awesome-Agent-Skills-for-Empirical-Research software-engineering-research

Guide to software engineering research topics and methodologies

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/cs/software-engineering-research" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-software-engineer && rm -rf "$T"

manifest: skills/43-wentorai-research-plugins/skills/domains/cs/software-engineering-research/SKILL.md

source content

Software Engineering Research Guide

Navigate the landscape of software engineering research, including key subfields, methodologies, datasets, benchmarks, and top venues.

SE Research Subfields

Subfield	Key Topics	Major Venues
Software Testing	Test generation, fuzzing, mutation testing, flaky tests	ISSTA, ICST, ASE
Program Analysis	Static analysis, abstract interpretation, symbolic execution	PLDI, POPL, OOPSLA
Software Maintenance	Code refactoring, technical debt, code smells, evolution	ICSME, MSR, SANER
SE for AI/ML	ML pipeline testing, data quality, model debugging	ICSE-SEIP, FSE
AI for SE	Code generation, bug detection, program repair	ICSE, FSE, ASE
Distributed Systems	Consensus, fault tolerance, scalability, microservices	SOSP, OSDI, EuroSys
Cybersecurity	Vulnerability detection, malware analysis, privacy	IEEE S&P, CCS, USENIX Security
HCI in SE	Developer tools, IDE usability, code comprehension	CHI, CSCW, VL/HCC
Empirical SE	Mining repositories, developer surveys, controlled experiments	ESEM, MSR, TOSEM

Research Methodologies in SE

Controlled Experiments

Testing a specific hypothesis with treatment and control groups:

Example: Does AI code completion improve developer productivity?

Design:
- Participants: 60 professional developers
- Treatment: IDE with AI code completion enabled
- Control: IDE with AI code completion disabled
- Task: Complete 5 programming tasks of varying difficulty
- Metrics: Task completion time, code correctness, lines of code
- Analysis: Mixed-effects linear model with participant as random effect

Threats to validity:
- Internal: Learning effect (counterbalance task order)
- External: Lab setting may not reflect real development
- Construct: "Productivity" operationalized as speed + correctness

Mining Software Repositories (MSR)

Analyzing data from version control, issue trackers, code review systems:

# Example: Analyze commit patterns using PyDriller
from pydriller import Repository

repo_url = "https://github.com/apache/kafka"

commit_data = []
for commit in Repository(repo_url, since=datetime(2023, 1, 1),
                          to=datetime(2023, 12, 31)).traverse_commits():
    commit_data.append({
        "hash": commit.hash[:8],
        "author": commit.author.name,
        "date": commit.committer_date,
        "files_changed": commit.files,
        "insertions": commit.insertions,
        "deletions": commit.deletions,
        "message": commit.msg[:100]
    })

df = pd.DataFrame(commit_data)
print(f"Total commits in 2023: {len(df)}")
print(f"Unique contributors: {df['author'].nunique()}")
print(f"Avg files per commit: {df['files_changed'].mean():.1f}")

Case Studies

In-depth investigation of a phenomenon in its real-world context:

Case Study Protocol (based on Yin, 2018):
1. Research questions: How do teams adopt microservices?
2. Unit of analysis: Development teams at 3 companies
3. Data sources:
   - Semi-structured interviews (8-12 per company)
   - Architecture documentation review
   - Commit history and deployment logs
   - Meeting observations
4. Analysis: Thematic analysis with cross-case comparison
5. Validity: Triangulation across data sources, member checking

Key Datasets and Benchmarks

Code Understanding and Generation

Benchmark	Task	Languages	Size
HumanEval	Code generation from docstrings	Python	164 problems
MBPP	Code generation from descriptions	Python	974 problems
SWE-bench	Real-world GitHub issue resolution	Python	2,294 instances
CodeXGLUE	Multiple code tasks	6 languages	Varies by task
BigCloneBench	Clone detection	Java	6M clone pairs
Defects4J	Bug localization and repair	Java	835 real bugs

Software Engineering Process

Dataset	Content	Use Cases
GHTorrent	GitHub event data (commits, issues, PRs)	MSR studies
Software Heritage	Universal source code archive	Code evolution, provenance
Stack Overflow Data Dump	Q&A posts, tags, votes	Developer knowledge, NLP
CVE Database	Vulnerability records	Security research
Chrome/Firefox Bug Trackers	Bug reports, patches	Bug triage, severity prediction

Static Analysis Tools for Research

# Example: Using tree-sitter for AST-level code analysis
from tree_sitter import Language, Parser
import tree_sitter_python as tspython

PYTHON_LANGUAGE = Language(tspython.language())
parser = Parser(PYTHON_LANGUAGE)

source_code = b"""
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
"""

tree = parser.parse(source_code)
root = tree.root_node

def count_nodes(node, node_type):
    """Count AST nodes of a given type."""
    count = 1 if node.type == node_type else 0
    for child in node.children:
        count += count_nodes(child, node_type)
    return count

print(f"Function definitions: {count_nodes(root, 'function_definition')}")
print(f"If statements: {count_nodes(root, 'if_statement')}")
print(f"Return statements: {count_nodes(root, 'return_statement')}")
print(f"Function calls: {count_nodes(root, 'call')}")

Code Metrics

# Common software metrics
metrics = {
    "Lines of Code (LOC)": "Total lines (including blanks and comments)",
    "Cyclomatic Complexity": "Number of independent paths (McCabe, 1976)",
    "Halstead Volume": "Based on operators and operands count",
    "Maintainability Index": "Composite of LOC, CC, and Halstead",
    "Coupling Between Objects": "Number of other classes referenced",
    "Depth of Inheritance": "Levels in class hierarchy",
    "Code Churn": "Lines added + modified + deleted per period",
    "Comment Density": "Ratio of comment lines to total lines"
}

# Calculate cyclomatic complexity using radon
# pip install radon
import subprocess
result = subprocess.run(
    ["radon", "cc", "my_module.py", "-s", "-j"],
    capture_output=True, text=True
)
print(result.stdout)

Top Venues and Impact

Tier-1 SE Venues

Venue	Type	Acceptance Rate	Focus
ICSE	Conference	~22%	Broad SE
FSE/ESEC	Conference	~24%	Broad SE
ASE	Conference	~22%	Automated SE
ISSTA	Conference	~25%	Software testing
MSR	Conference	~30%	Mining repositories
TOSEM	Journal	--	Broad SE (ACM)
TSE	Journal	--	Broad SE (IEEE)
EMSE	Journal	--	Empirical SE (Springer)

Systems and Security Venues

Venue	Type	Focus
SOSP/OSDI	Conference	Operating systems, distributed systems
EuroSys	Conference	Systems (Europe)
NSDI	Conference	Networked systems design
IEEE S&P (Oakland)	Conference	Security and privacy
USENIX Security	Conference	Security
CCS	Conference	Computer and communications security
NDSS	Conference	Network and distributed systems security

Research Tools Ecosystem

Tool	Purpose	URL
PyDriller	Git repository mining (Python)	github.com/ishepard/pydriller
Radon	Python code metrics	github.com/rubik/radon
SonarQube	Multi-language static analysis	sonarqube.org
Understand	Code analysis and metrics	scitools.com
Joern	Code analysis platform (CPG)	joern.io
CodeQL	Semantic code analysis	codeql.github.com
tree-sitter	Incremental parsing library	tree-sitter.github.io