Awesome-Agent-Skills-for-Empirical-Research software-engineering-research
Guide to software engineering research topics and methodologies
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/cs/software-engineering-research" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-software-engineer && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/domains/cs/software-engineering-research/SKILL.mdsource content
Software Engineering Research Guide
Navigate the landscape of software engineering research, including key subfields, methodologies, datasets, benchmarks, and top venues.
SE Research Subfields
| Subfield | Key Topics | Major Venues |
|---|---|---|
| Software Testing | Test generation, fuzzing, mutation testing, flaky tests | ISSTA, ICST, ASE |
| Program Analysis | Static analysis, abstract interpretation, symbolic execution | PLDI, POPL, OOPSLA |
| Software Maintenance | Code refactoring, technical debt, code smells, evolution | ICSME, MSR, SANER |
| SE for AI/ML | ML pipeline testing, data quality, model debugging | ICSE-SEIP, FSE |
| AI for SE | Code generation, bug detection, program repair | ICSE, FSE, ASE |
| Distributed Systems | Consensus, fault tolerance, scalability, microservices | SOSP, OSDI, EuroSys |
| Cybersecurity | Vulnerability detection, malware analysis, privacy | IEEE S&P, CCS, USENIX Security |
| HCI in SE | Developer tools, IDE usability, code comprehension | CHI, CSCW, VL/HCC |
| Empirical SE | Mining repositories, developer surveys, controlled experiments | ESEM, MSR, TOSEM |
Research Methodologies in SE
Controlled Experiments
Testing a specific hypothesis with treatment and control groups:
Example: Does AI code completion improve developer productivity? Design: - Participants: 60 professional developers - Treatment: IDE with AI code completion enabled - Control: IDE with AI code completion disabled - Task: Complete 5 programming tasks of varying difficulty - Metrics: Task completion time, code correctness, lines of code - Analysis: Mixed-effects linear model with participant as random effect Threats to validity: - Internal: Learning effect (counterbalance task order) - External: Lab setting may not reflect real development - Construct: "Productivity" operationalized as speed + correctness
Mining Software Repositories (MSR)
Analyzing data from version control, issue trackers, code review systems:
# Example: Analyze commit patterns using PyDriller from pydriller import Repository repo_url = "https://github.com/apache/kafka" commit_data = [] for commit in Repository(repo_url, since=datetime(2023, 1, 1), to=datetime(2023, 12, 31)).traverse_commits(): commit_data.append({ "hash": commit.hash[:8], "author": commit.author.name, "date": commit.committer_date, "files_changed": commit.files, "insertions": commit.insertions, "deletions": commit.deletions, "message": commit.msg[:100] }) df = pd.DataFrame(commit_data) print(f"Total commits in 2023: {len(df)}") print(f"Unique contributors: {df['author'].nunique()}") print(f"Avg files per commit: {df['files_changed'].mean():.1f}")
Case Studies
In-depth investigation of a phenomenon in its real-world context:
Case Study Protocol (based on Yin, 2018): 1. Research questions: How do teams adopt microservices? 2. Unit of analysis: Development teams at 3 companies 3. Data sources: - Semi-structured interviews (8-12 per company) - Architecture documentation review - Commit history and deployment logs - Meeting observations 4. Analysis: Thematic analysis with cross-case comparison 5. Validity: Triangulation across data sources, member checking
Key Datasets and Benchmarks
Code Understanding and Generation
| Benchmark | Task | Languages | Size |
|---|---|---|---|
| HumanEval | Code generation from docstrings | Python | 164 problems |
| MBPP | Code generation from descriptions | Python | 974 problems |
| SWE-bench | Real-world GitHub issue resolution | Python | 2,294 instances |
| CodeXGLUE | Multiple code tasks | 6 languages | Varies by task |
| BigCloneBench | Clone detection | Java | 6M clone pairs |
| Defects4J | Bug localization and repair | Java | 835 real bugs |
Software Engineering Process
| Dataset | Content | Use Cases |
|---|---|---|
| GHTorrent | GitHub event data (commits, issues, PRs) | MSR studies |
| Software Heritage | Universal source code archive | Code evolution, provenance |
| Stack Overflow Data Dump | Q&A posts, tags, votes | Developer knowledge, NLP |
| CVE Database | Vulnerability records | Security research |
| Chrome/Firefox Bug Trackers | Bug reports, patches | Bug triage, severity prediction |
Static Analysis Tools for Research
# Example: Using tree-sitter for AST-level code analysis from tree_sitter import Language, Parser import tree_sitter_python as tspython PYTHON_LANGUAGE = Language(tspython.language()) parser = Parser(PYTHON_LANGUAGE) source_code = b""" def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2) """ tree = parser.parse(source_code) root = tree.root_node def count_nodes(node, node_type): """Count AST nodes of a given type.""" count = 1 if node.type == node_type else 0 for child in node.children: count += count_nodes(child, node_type) return count print(f"Function definitions: {count_nodes(root, 'function_definition')}") print(f"If statements: {count_nodes(root, 'if_statement')}") print(f"Return statements: {count_nodes(root, 'return_statement')}") print(f"Function calls: {count_nodes(root, 'call')}")
Code Metrics
# Common software metrics metrics = { "Lines of Code (LOC)": "Total lines (including blanks and comments)", "Cyclomatic Complexity": "Number of independent paths (McCabe, 1976)", "Halstead Volume": "Based on operators and operands count", "Maintainability Index": "Composite of LOC, CC, and Halstead", "Coupling Between Objects": "Number of other classes referenced", "Depth of Inheritance": "Levels in class hierarchy", "Code Churn": "Lines added + modified + deleted per period", "Comment Density": "Ratio of comment lines to total lines" } # Calculate cyclomatic complexity using radon # pip install radon import subprocess result = subprocess.run( ["radon", "cc", "my_module.py", "-s", "-j"], capture_output=True, text=True ) print(result.stdout)
Top Venues and Impact
Tier-1 SE Venues
| Venue | Type | Acceptance Rate | Focus |
|---|---|---|---|
| ICSE | Conference | ~22% | Broad SE |
| FSE/ESEC | Conference | ~24% | Broad SE |
| ASE | Conference | ~22% | Automated SE |
| ISSTA | Conference | ~25% | Software testing |
| MSR | Conference | ~30% | Mining repositories |
| TOSEM | Journal | -- | Broad SE (ACM) |
| TSE | Journal | -- | Broad SE (IEEE) |
| EMSE | Journal | -- | Empirical SE (Springer) |
Systems and Security Venues
| Venue | Type | Focus |
|---|---|---|
| SOSP/OSDI | Conference | Operating systems, distributed systems |
| EuroSys | Conference | Systems (Europe) |
| NSDI | Conference | Networked systems design |
| IEEE S&P (Oakland) | Conference | Security and privacy |
| USENIX Security | Conference | Security |
| CCS | Conference | Computer and communications security |
| NDSS | Conference | Network and distributed systems security |
Research Tools Ecosystem
| Tool | Purpose | URL |
|---|---|---|
| PyDriller | Git repository mining (Python) | github.com/ishepard/pydriller |
| Radon | Python code metrics | github.com/rubik/radon |
| SonarQube | Multi-language static analysis | sonarqube.org |
| Understand | Code analysis and metrics | scitools.com |
| Joern | Code analysis platform (CPG) | joern.io |
| CodeQL | Semantic code analysis | codeql.github.com |
| tree-sitter | Incremental parsing library | tree-sitter.github.io |