Awesome-Agent-Skills-for-Empirical-Research software-engineering-research

Guide to software engineering research topics and methodologies

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/cs/software-engineering-research" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-software-engineer && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/domains/cs/software-engineering-research/SKILL.md
source content

Software Engineering Research Guide

Navigate the landscape of software engineering research, including key subfields, methodologies, datasets, benchmarks, and top venues.

SE Research Subfields

SubfieldKey TopicsMajor Venues
Software TestingTest generation, fuzzing, mutation testing, flaky testsISSTA, ICST, ASE
Program AnalysisStatic analysis, abstract interpretation, symbolic executionPLDI, POPL, OOPSLA
Software MaintenanceCode refactoring, technical debt, code smells, evolutionICSME, MSR, SANER
SE for AI/MLML pipeline testing, data quality, model debuggingICSE-SEIP, FSE
AI for SECode generation, bug detection, program repairICSE, FSE, ASE
Distributed SystemsConsensus, fault tolerance, scalability, microservicesSOSP, OSDI, EuroSys
CybersecurityVulnerability detection, malware analysis, privacyIEEE S&P, CCS, USENIX Security
HCI in SEDeveloper tools, IDE usability, code comprehensionCHI, CSCW, VL/HCC
Empirical SEMining repositories, developer surveys, controlled experimentsESEM, MSR, TOSEM

Research Methodologies in SE

Controlled Experiments

Testing a specific hypothesis with treatment and control groups:

Example: Does AI code completion improve developer productivity?

Design:
- Participants: 60 professional developers
- Treatment: IDE with AI code completion enabled
- Control: IDE with AI code completion disabled
- Task: Complete 5 programming tasks of varying difficulty
- Metrics: Task completion time, code correctness, lines of code
- Analysis: Mixed-effects linear model with participant as random effect

Threats to validity:
- Internal: Learning effect (counterbalance task order)
- External: Lab setting may not reflect real development
- Construct: "Productivity" operationalized as speed + correctness

Mining Software Repositories (MSR)

Analyzing data from version control, issue trackers, code review systems:

# Example: Analyze commit patterns using PyDriller
from pydriller import Repository

repo_url = "https://github.com/apache/kafka"

commit_data = []
for commit in Repository(repo_url, since=datetime(2023, 1, 1),
                          to=datetime(2023, 12, 31)).traverse_commits():
    commit_data.append({
        "hash": commit.hash[:8],
        "author": commit.author.name,
        "date": commit.committer_date,
        "files_changed": commit.files,
        "insertions": commit.insertions,
        "deletions": commit.deletions,
        "message": commit.msg[:100]
    })

df = pd.DataFrame(commit_data)
print(f"Total commits in 2023: {len(df)}")
print(f"Unique contributors: {df['author'].nunique()}")
print(f"Avg files per commit: {df['files_changed'].mean():.1f}")

Case Studies

In-depth investigation of a phenomenon in its real-world context:

Case Study Protocol (based on Yin, 2018):
1. Research questions: How do teams adopt microservices?
2. Unit of analysis: Development teams at 3 companies
3. Data sources:
   - Semi-structured interviews (8-12 per company)
   - Architecture documentation review
   - Commit history and deployment logs
   - Meeting observations
4. Analysis: Thematic analysis with cross-case comparison
5. Validity: Triangulation across data sources, member checking

Key Datasets and Benchmarks

Code Understanding and Generation

BenchmarkTaskLanguagesSize
HumanEvalCode generation from docstringsPython164 problems
MBPPCode generation from descriptionsPython974 problems
SWE-benchReal-world GitHub issue resolutionPython2,294 instances
CodeXGLUEMultiple code tasks6 languagesVaries by task
BigCloneBenchClone detectionJava6M clone pairs
Defects4JBug localization and repairJava835 real bugs

Software Engineering Process

DatasetContentUse Cases
GHTorrentGitHub event data (commits, issues, PRs)MSR studies
Software HeritageUniversal source code archiveCode evolution, provenance
Stack Overflow Data DumpQ&A posts, tags, votesDeveloper knowledge, NLP
CVE DatabaseVulnerability recordsSecurity research
Chrome/Firefox Bug TrackersBug reports, patchesBug triage, severity prediction

Static Analysis Tools for Research

# Example: Using tree-sitter for AST-level code analysis
from tree_sitter import Language, Parser
import tree_sitter_python as tspython

PYTHON_LANGUAGE = Language(tspython.language())
parser = Parser(PYTHON_LANGUAGE)

source_code = b"""
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
"""

tree = parser.parse(source_code)
root = tree.root_node

def count_nodes(node, node_type):
    """Count AST nodes of a given type."""
    count = 1 if node.type == node_type else 0
    for child in node.children:
        count += count_nodes(child, node_type)
    return count

print(f"Function definitions: {count_nodes(root, 'function_definition')}")
print(f"If statements: {count_nodes(root, 'if_statement')}")
print(f"Return statements: {count_nodes(root, 'return_statement')}")
print(f"Function calls: {count_nodes(root, 'call')}")

Code Metrics

# Common software metrics
metrics = {
    "Lines of Code (LOC)": "Total lines (including blanks and comments)",
    "Cyclomatic Complexity": "Number of independent paths (McCabe, 1976)",
    "Halstead Volume": "Based on operators and operands count",
    "Maintainability Index": "Composite of LOC, CC, and Halstead",
    "Coupling Between Objects": "Number of other classes referenced",
    "Depth of Inheritance": "Levels in class hierarchy",
    "Code Churn": "Lines added + modified + deleted per period",
    "Comment Density": "Ratio of comment lines to total lines"
}

# Calculate cyclomatic complexity using radon
# pip install radon
import subprocess
result = subprocess.run(
    ["radon", "cc", "my_module.py", "-s", "-j"],
    capture_output=True, text=True
)
print(result.stdout)

Top Venues and Impact

Tier-1 SE Venues

VenueTypeAcceptance RateFocus
ICSEConference~22%Broad SE
FSE/ESECConference~24%Broad SE
ASEConference~22%Automated SE
ISSTAConference~25%Software testing
MSRConference~30%Mining repositories
TOSEMJournal--Broad SE (ACM)
TSEJournal--Broad SE (IEEE)
EMSEJournal--Empirical SE (Springer)

Systems and Security Venues

VenueTypeFocus
SOSP/OSDIConferenceOperating systems, distributed systems
EuroSysConferenceSystems (Europe)
NSDIConferenceNetworked systems design
IEEE S&P (Oakland)ConferenceSecurity and privacy
USENIX SecurityConferenceSecurity
CCSConferenceComputer and communications security
NDSSConferenceNetwork and distributed systems security

Research Tools Ecosystem

ToolPurposeURL
PyDrillerGit repository mining (Python)github.com/ishepard/pydriller
RadonPython code metricsgithub.com/rubik/radon
SonarQubeMulti-language static analysissonarqube.org
UnderstandCode analysis and metricsscitools.com
JoernCode analysis platform (CPG)joern.io
CodeQLSemantic code analysiscodeql.github.com
tree-sitterIncremental parsing librarytree-sitter.github.io