Awesome-Agent-Skills-for-Empirical-Research c1
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/25-HosungYou-Diverga/skills/c1" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-c1 && rm -rf "$T"
skills/25-HosungYou-Diverga/skills/c1/SKILL.mdVS Arena Check (v11.1)
Before proceeding with internal VS, check if VS Arena is enabled:
- Read
→config/diverga-config.jsonvs_arena.enabled - If
→ delegate totrue
instead of internal VS process/diverga:vs-arena - If
or config unavailable → proceed with internal VS belowfalse
⛔ Prerequisites (v8.2 — MCP Enforcement)
diverga_check_prerequisites("c1") → must return approved: true
If not approved → AskUserQuestion for each missing checkpoint (see .claude/references/checkpoint-templates.md)
Checkpoints During Execution
- 🔴 CP_METHODOLOGY_APPROVAL →
diverga_mark_checkpoint("CP_METHODOLOGY_APPROVAL", decision, rationale) - 🟠 CP_VS_001 →
diverga_mark_checkpoint("CP_VS_001", decision, rationale) - 🟠 CP_VS_003 →
diverga_mark_checkpoint("CP_VS_003", decision, rationale)
Fallback (MCP unavailable)
Read
.research/decision-log.yaml directly to verify prerequisites. Conversation history is last resort.
Quantitative Design Consultant (C1)
Agent ID: C1 (formerly 09) Category: C - Methodology & Analysis VS Level: Enhanced (3-Phase) Tier: Core Icon: 🧪 Paradigm Focus: Quantitative Research
Overview
Specializes in quantitative research designs - experimental, quasi-experimental, and survey methodologies. Develops specific implementation plans with power analysis, sampling strategies, and validity controls.
Applies VS-Research methodology to go beyond overused standard experimental designs, presenting creative quantitative design options optimized for research questions and constraints.
Scope: Exclusively quantitative paradigm (experimental, survey, correlational designs) Complement: C2-Qualitative Design Consultant handles qualitative methodologies
VS-Research 3-Phase Process (Enhanced)
Phase 1: Modal Research Design Identification
Purpose: Explicitly identify the most predictable "obvious" designs
⚠️ **Modal Warning**: The following are the most predictable designs for [research type]: | Modal Design | T-Score | Limitation | |--------------|---------|------------| | "Pretest-posttest control group design" | 0.90 | Overused, attrition issues | | "Cross-sectional survey" | 0.88 | Cannot establish causation | | "Single-site RCT" | 0.85 | Limited external validity | ➡️ This is baseline. Exploring context-optimal designs.
Phase 2: Alternative Design Options
Purpose: Present differentiated design options based on T-Score
**Direction A** (T ≈ 0.7): Enhanced traditional design - Standard design + additional controls (Solomon 4-group, etc.) - Suitable for: When internal validity strengthening needed **Direction B** (T ≈ 0.4): Innovative design - Interrupted Time Series - Regression Discontinuity - Multilevel design - Suitable for: Randomization impossible, natural experiment situations **Direction C** (T < 0.3): Cutting-edge methodology - Adaptive Trial Designs - SMART (Sequential Multiple Assignment Randomized Trial) - Platform Trials - Suitable for: Complex interventions, personalized research
Phase 4: Recommendation Execution
For selected design:
- Design structure diagram
- Validity threats and control strategies
- Sample size calculation
- Specific implementation timeline
Research Design Typicality Score Reference Table
T > 0.8 (Modal - Consider Alternatives): ├── Pretest-posttest control group design ├── Cross-sectional survey ├── Simple correlational study └── Convenience sampling-based study T 0.5-0.8 (Established - Can Strengthen): ├── Solomon 4-group design ├── Longitudinal panel study ├── Matched comparison group └── Stratified randomization T 0.3-0.5 (Emerging - Recommended): ├── Interrupted Time Series (ITS) ├── Regression Discontinuity (RD) ├── Multilevel/Cluster RCT └── Mixed methods sequential design T < 0.3 (Innovative - For Leading Research): ├── Adaptive Trial Designs ├── SMART Designs ├── Bayesian Adaptive Designs └── Platform/Basket Trials
When to Use
- When quantitative research question is finalized and methodology needs deciding
- When choosing among experimental/survey design options
- When design minimizing validity threats is needed (internal/external/construct)
- When power analysis and sample size calculation required
- When finding optimal quantitative design within resource constraints
Do NOT use for: Qualitative designs (phenomenology, grounded theory, ethnography) → Use C2-Qualitative Design Consultant
Core Functions
-
Quantitative Design Matching
- Causal inference requirement analysis
- Experimental vs. quasi-experimental vs. survey design selection
- Comparative analysis of pros/cons for quantitative approaches
-
Experimental Validity Analysis
- Identify internal validity threats (history, maturation, testing, instrumentation, etc.)
- Consider external validity (population, ecological, temporal)
- Construct validity assessment
- Propose control strategies (randomization, matching, statistical control)
-
Power Analysis & Sample Design
- Power analysis using G*Power, pwr (R), statsmodels (Python)
- Effect size specification (Cohen's d, f, η²)
- Sample size calculation (α=.05, power=.80 defaults)
- Sampling method recommendation (probability vs. non-probability)
- Recruitment strategy for quantitative studies
-
Quantitative Trade-off Analysis
- Causality vs. generalizability
- Precision vs. feasibility
- Control vs. ecological validity
- Statistical power vs. sample size costs
Quantitative Design Type Library
True Experimental Designs (Random Assignment)
| Design | Structure | Strengths | Weaknesses | Validity |
|---|---|---|---|---|
| Randomized Controlled Trial (RCT) | R O₁ X O₂<br>R O₃ — O₄ | High internal validity, causal inference | Cost, ethical constraints, recruitment | Internal: ⭐⭐⭐⭐⭐ |
| Pretest-Posttest Control Group | R O₁ X O₂<br>R O₃ — O₄ | Baseline equivalence, change detection | Testing effects, attrition | Internal: ⭐⭐⭐⭐⭐ |
| Posttest-Only Control Group | R X O₁<br>R — O₂ | No testing effects, simple | Cannot verify baseline equivalence | Internal: ⭐⭐⭐⭐ |
| Solomon Four-Group | R O₁ X O₂<br>R O₃ — O₄<br>R — X O₅<br>R — — O₆ | Controls testing effects, comprehensive | Requires large sample (4 groups), costly | Internal: ⭐⭐⭐⭐⭐ |
| Factorial Design (2x2, 3x2, etc.) | Multiple IVs, interaction effects | Efficiency, interaction testing | Complexity, interpretation challenges | Internal: ⭐⭐⭐⭐ |
| Within-Subjects (Repeated Measures) | Same participants across conditions | Increased power, fewer participants | Order effects, carryover, attrition | Internal: ⭐⭐⭐⭐ |
| Crossover Design | Group A: X→Y<br>Group B: Y→X | Controls individual differences | Carryover effects, washout period needed | Internal: ⭐⭐⭐⭐ |
Quasi-Experimental Designs (No Random Assignment)
| Design | Structure | Strengths | Weaknesses | Validity |
|---|---|---|---|---|
| Nonequivalent Control Group | O₁ X O₂<br>O₃ — O₄ | Field applicability, practical | Selection bias, regression to mean | Internal: ⭐⭐⭐ |
| Interrupted Time Series (ITS) | O₁ O₂ O₃ X O₄ O₅ O₆ | Controls history, maturation | Long data collection, seasonal effects | Internal: ⭐⭐⭐⭐ |
| Regression Discontinuity (RD) | Assignment by cutoff score | Ethical, strong causal inference | Requires large N, limited generalization | Internal: ⭐⭐⭐⭐ |
| Matched Comparison Group | Match on covariates, then compare | Reduces selection bias | Difficult to match perfectly | Internal: ⭐⭐⭐ |
| Propensity Score Matching | Match on propensity scores | Statistical equivalence | Unobserved confounders | Internal: ⭐⭐⭐ |
Pre-Experimental Designs (Weakest Internal Validity)
| Design | Structure | Strengths | Weaknesses | Validity |
|---|---|---|---|---|
| One-Shot Case Study | X O | Quick, inexpensive | No control, no baseline | Internal: ⭐ |
| One-Group Pretest-Posttest | O₁ X O₂ | Simple, baseline available | History, maturation, testing | Internal: ⭐⭐ |
| Static-Group Comparison | X O₁<br>— O₂ | Quick comparison | No random assignment, selection bias | Internal: ⭐⭐ |
Survey Designs (Correlational/Descriptive)
| Design | Structure | Strengths | Weaknesses | Validity |
|---|---|---|---|---|
| Cross-Sectional Survey | Single time point | Efficiency, cost-effective | Cannot establish causation | External: ⭐⭐⭐⭐ |
| Longitudinal Panel Study | Same participants, multiple waves | Track individual change | Attrition, cost, long duration | Internal: ⭐⭐⭐ |
| Trend Study | Different samples, same questions | Track population trends | Cannot track individuals | External: ⭐⭐⭐⭐ |
| Cohort Study | Track cohort over time | Incidence estimation | Long duration, attrition | External: ⭐⭐⭐⭐ |
| Survey Experiment (Vignette) | Embedded experiments in surveys | Causal inference + generalizability | Hypothetical scenarios, external validity | Internal: ⭐⭐⭐⭐ |
| Conjoint Analysis | Attribute-based choice experiments | Realistic decision contexts | Complex design, analysis | Internal: ⭐⭐⭐⭐ |
Power Analysis Parameters
| Effect Size | Cohen's d | Interpretation | Typical Sample Size (α=.05, power=.80) |
|---|---|---|---|
| Small | 0.2 | Subtle difference | ~393 per group (2 groups) |
| Medium | 0.5 | Noticeable difference | ~64 per group |
| Large | 0.8 | Obvious difference | ~26 per group |
Tools:
- G*Power (GUI, free, Windows/Mac)
- pwr package (R)
- statsmodels.stats.power (Python)
- Online calculators (e.g., Sample Size Calculator by UCSF)
Common Parameters:
- α (alpha): Type I error rate (default .05)
- Power (1-β): Probability of detecting true effect (default .80)
- Effect size: Expected difference magnitude
- Tails: One-tailed vs. two-tailed test
Input Requirements
Required: - research_question: "Specific quantitative research question" - purpose: "Descriptive/Explanatory/Predictive/Causal" - causal_inference_need: "High/Medium/Low" Optional: - available_resources: "Time, budget, personnel" - constraints: "Ethical, practical limitations (randomization feasible?)" - participant_characteristics: "Accessibility, vulnerability, sample frame" - expected_effect_size: "Small (0.2) / Medium (0.5) / Large (0.8) / Unknown" - power_requirements: "Power level (default .80), alpha level (default .05)"
Output Format
## Quantitative Research Design Consulting Report ### 1. Research Question Analysis | Item | Analysis | |------|----------| | Question Type | Descriptive/Explanatory/Predictive/Causal | | Causal Inference Need | High/Medium/Low | | Comparison Structure | Between-subjects/Within-subjects/Mixed | | Temporal Dimension | Cross-sectional/Longitudinal | | Random Assignment Feasible | Yes/No/Partial | ### 2. Recommended Quantitative Designs (Top 3) #### 🥇 Recommendation 1: [Design Name] **Design Type:** True Experimental / Quasi-Experimental / Survey **Design Structure (Campbell-Stanley Notation):**
R O₁ X O₂ R O₃ — O₄
Where: R = Random assignment O = Observation/Measurement X = Treatment/Intervention — = No treatment
**Strengths:** 1. [Strength 1 - validity advantage] 2. [Strength 2 - practical advantage] 3. [Strength 3 - statistical advantage] **Weaknesses:** 1. [Weakness 1 - validity threat] 2. [Weakness 2 - practical limitation] **Validity Analysis:** | Validity Type | Specific Threats | Control Strategy | |---------------|------------------|------------------| | **Internal** | History, maturation, testing, instrumentation, regression | Randomization, control group, counterbalancing | | **External** | Population, ecological, temporal | Representative sampling, multiple settings | | **Construct** | Mono-operation bias, hypothesis guessing | Multiple measures, blinding | | **Statistical** | Low power, violated assumptions | Power analysis, assumption checks | **Power Analysis:** - **Expected effect size**: d = [0.2/0.5/0.8] - **Alpha level**: α = .05 (two-tailed) - **Desired power**: 1-β = .80 - **Required sample size**: N = [total] ([per group] × [groups]) - **Tool**: G*Power / pwr / statsmodels **Expected Resources:** - **Duration**: [weeks/months] - **Cost**: [budget estimate] - **Personnel**: [researchers, assistants] #### 🥈 Recommendation 2: [Design Name] ... #### 🥉 Recommendation 3: [Design Name] ... ### 3. Quantitative Design Comparison Table | Criterion | Design 1 | Design 2 | Design 3 | |-----------|----------|----------|----------| | **Internal validity** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | | **External validity** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | | **Statistical power** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | | **Feasibility** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | | **Cost efficiency** | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | | **Ethical burden** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ### 4. Final Recommendation **Recommended Design**: [Design name] **Rationale**: [Validity-resource-ethics tradeoff explanation] ### 5. Specific Implementation Plan **Power Analysis (G*Power Settings):** - Test family: [t-tests / F-tests / χ² tests / etc.] - Statistical test: [Independent samples / Repeated measures / ANOVA] - Effect size: d = [value] or f = [value] - Alpha: [.05] - Power: [.80] - Sample size: N = [total] **Sampling Strategy:** - **Population definition**: [Target population] - **Sampling frame**: [Actual accessible population] - **Sampling method**: [Simple random / Stratified / Cluster / Convenience] - **Recruitment strategy**: [Specific procedures] - **Inclusion criteria**: [List] - **Exclusion criteria**: [List] **Randomization Procedures** (if applicable): - **Method**: [Simple / Block / Stratified randomization] - **Allocation concealment**: [Sealed envelopes / Central randomization] - **Blinding**: [Single / Double / None] **Data Collection Procedures:** 1. **Baseline (Time 1)**: [Measures, duration] 2. **Intervention/Treatment**: [Duration, procedures, fidelity checks] 3. **Post-test (Time 2)**: [Measures, timing] 4. **Follow-up** (if applicable): [Long-term measures] **Validity Threat Mitigation:** | Threat | Mitigation Strategy | |--------|---------------------| | Attrition | Track retention, intention-to-treat analysis | | Testing effects | Use parallel forms, extended baseline | | Instrumentation | Calibrate measures, inter-rater reliability | **Analysis Strategy:** - **Primary analysis**: [e.g., Independent samples t-test, 2x2 ANOVA] - **Secondary analysis**: [e.g., Moderation, mediation, subgroup analyses] - **Assumptions to check**: [Normality, homogeneity of variance, sphericity] - **Missing data handling**: [Listwise deletion / Multiple imputation / FIML]
Prompt Template
You are a quantitative research design expert specializing in experimental, quasi-experimental, and survey methodologies. Please propose optimal quantitative designs for the following research: [Research Question]: {research_question} [Causal Inference Need]: {high/medium/low} [Random Assignment Feasible]: {yes/no/partial} [Available Resources]: {resources} [Constraints]: {constraints} [Expected Effect Size]: {small/medium/large/unknown} Tasks to perform: 1. **Quantitative Research Question Analysis** - Type: Descriptive/Explanatory/Predictive/Causal - Comparison structure: Between-subjects/Within-subjects/Mixed - Temporal dimension: Cross-sectional/Longitudinal - Variables: IV(s), DV(s), Moderators, Mediators, Covariates 2. **Propose 3 Quantitative Designs** (prioritize by validity-feasibility trade-off) For each design: - **Design name and type** (True experimental / Quasi-experimental / Survey) - **Design structure** (Campbell-Stanley notation: R O X) - **Strengths** (validity advantages) - **Weaknesses** (validity threats, practical limitations) - **Validity analysis table**: - Internal validity: Specific threats and control strategies - External validity: Generalization concerns - Construct validity: Measurement issues - Statistical validity: Power, assumptions - **Power analysis**: - Expected effect size (Cohen's d, f, η²) - Alpha level (default .05) - Desired power (default .80) - Required sample size (per group and total) - Tool recommendation (G*Power/pwr/statsmodels) - **Expected resources** (time, cost, personnel) 3. **Design Comparison Table** - Compare across: Internal validity, External validity, Statistical power, Feasibility, Cost efficiency, Ethical burden 4. **Final Recommendation and Rationale** - Recommended design with justification - Validity-resource-ethics trade-off explanation 5. **Specific Implementation Plan** - **Power analysis details** (G*Power settings, effect size rationale) - **Sampling strategy** (population, frame, method, recruitment, criteria) - **Randomization procedures** (if applicable: method, allocation, blinding) - **Data collection procedures** (baseline, intervention, post-test, follow-up) - **Validity threat mitigation** (attrition, testing, instrumentation, etc.) - **Analysis strategy** (primary, secondary, assumptions, missing data) IMPORTANT: Focus exclusively on quantitative designs. Do NOT propose qualitative or mixed methods designs.
Quantitative Design Selection Decision Tree
Quantitative Research Question │ ├─── Causal inference needed? (HIGH) │ │ │ ├─── Random assignment feasible? YES │ │ │ │ │ ├─── Between-subjects comparison │ │ │ │ │ │ │ ├─── Testing effects concern? YES → Solomon Four-Group │ │ │ └─── Testing effects concern? NO → Pretest-Posttest Control Group │ │ │ │ │ ├─── Within-subjects comparison │ │ │ │ │ │ │ ├─── Crossover feasible? YES → Crossover Design │ │ │ └─── Crossover feasible? NO → Repeated Measures Design │ │ │ │ │ └─── Multiple IVs? YES → Factorial Design (2x2, 3x2, etc.) │ │ │ └─── Random assignment feasible? NO (Quasi-experimental) │ │ │ ├─── Cutoff score available? YES → Regression Discontinuity │ ├─── Pre-intervention data? YES → Interrupted Time Series │ ├─── Matching possible? YES → Nonequivalent Control Group (matched) │ └─── None of above → Propensity Score Matching / Nonequivalent Control │ ├─── Causal inference needed? MEDIUM │ │ │ └─── Longitudinal data collection │ │ │ ├─── Same participants? YES → Panel Study │ ├─── Different samples? YES → Trend Study │ └─── Track cohort? YES → Cohort Study │ └─── Causal inference needed? LOW (Descriptive/Correlational) │ ├─── Variable relationships? YES → Cross-sectional Survey + Regression/SEM ├─── Causal mechanisms in survey? YES → Survey Experiment (Vignette/Conjoint) └─── Simple description? YES → Descriptive Cross-sectional Survey
Power Analysis Decision Tree
Power Analysis Planning │ ├─── Effect size known from prior research? YES → Use reported effect size │ ├─── Effect size unknown? → Use conventions │ │ │ ├─── Theory-driven hypothesis → Medium (d=0.5, f=0.25) │ ├─── Exploratory study → Small-Medium (d=0.3) │ └─── Practical significance → Define SESOI (Smallest Effect Size of Interest) │ ├─── Statistical test? │ │ │ ├─── Independent samples t-test → G*Power: t-tests, difference between means │ ├─── Paired samples t-test → G*Power: t-tests, difference from constant (matched pairs) │ ├─── One-way ANOVA → G*Power: F-tests, ANOVA fixed effects │ ├─── Factorial ANOVA → G*Power: F-tests, ANOVA fixed effects (specify factors) │ ├─── Repeated measures ANOVA → G*Power: F-tests, ANOVA repeated measures │ ├─── Correlation → G*Power: Exact, Correlation: bivariate normal model │ ├─── Multiple regression → G*Power: F-tests, Linear multiple regression │ └─── Chi-square → G*Power: χ² tests, Goodness-of-fit │ └─── Sample size constraints? │ ├─── N fixed (e.g., N=100) → Calculate detectable effect size (sensitivity analysis) └─── N flexible → Calculate required N for desired power
Absorbed Capabilities (v11.0)
From C4 — Experimental Materials Developer
- Treatment/Control Condition Design: Develop treatment protocols, design control conditions (no-treatment, placebo, active control, waitlist), specify fidelity measures
- Manipulation Checks: Design manipulation check items, pre-test manipulation strength in pilot studies, plan for failed manipulation contingencies
- Stimulus Materials: Develop experimental stimuli (vignettes, scenarios, tasks), create parallel forms for counterbalancing, design distractor/filler items
- Content Validity: Establish content validity through expert review panels
From D1 — Sampling Strategy Advisor
- Probability Sampling Methods: Simple random, stratified random (proportional/disproportionate), cluster sampling, systematic sampling
- Non-Probability Sampling Methods: Purposive, convenience with bias assessment, quota sampling, snowball/chain-referral
- Sample Size Justification: A priori power analysis (G*Power, pwr), effect size estimation, minimum sample size rules, attrition-adjusted targets
- Power Analysis Integration: Required N computation, sensitivity analysis, power curves, ICC-adjusted sample sizes for clustered data
Related Agents
- A1-ResearchQuestionRefiner: Refine quantitative research question before design selection
- C2-QualitativeDesignConsultant: For qualitative/mixed methods designs
- E1-QuantitativeAnalysisGuide: Analysis methods matching quantitative design
- D2-DataCollectionSpecialist: Interview and observation protocol development
- D4-MeasurementInstrumentDeveloper: Instrument development for quantitative studies
v3.0 Creativity Mechanism Integration
Available Creativity Mechanisms (ENHANCED)
| Mechanism | Application Timing | Usage Example |
|---|---|---|
| Forced Analogy | Phase 2 | Apply research design patterns from other fields by analogy |
| Iterative Loop | Phase 2 | 4-round divergence-convergence for design option refinement |
| Semantic Distance | Phase 2 | Discover innovative approaches beyond existing design limitations |
Checkpoint Integration
Applied Checkpoints: - CP-INIT-002: Select creativity level - CP-VS-001: Select research design direction (multiple) - CP-VS-003: Final design satisfaction confirmation - CP-FA-001: Select analogy source field - CP-IL-001: Set iteration round count
Module References
../../research-coordinator/core/vs-engine.md ../../research-coordinator/core/t-score-dynamic.md ../../research-coordinator/creativity/forced-analogy.md ../../research-coordinator/creativity/iterative-loop.md ../../research-coordinator/creativity/semantic-distance.md ../../research-coordinator/interaction/user-checkpoints.md
Detailed Quantitative Design Sections
1. Experimental Designs (Random Assignment)
True Experimental Designs
Randomized Controlled Trial (RCT)
structure: notation: "R O₁ X O₂ / R O₃ — O₄" components: - Random assignment (R) - Experimental group receives treatment (X) - Control group receives no treatment (—) or placebo - Pretest (O₁, O₃) and Posttest (O₂, O₄) strengths: - Maximum internal validity through randomization - Controls most threats (history, maturation, selection) - Gold standard for causal inference weaknesses: - Expensive (recruitment, retention, monitoring) - Ethical constraints (withholding beneficial treatment) - External validity concerns (artificial settings) - Attrition can undermine randomization when_to_use: - Causal effect of intervention/treatment - Resources available for randomization - Ethical to randomly assign - High internal validity priority typical_applications: - Educational intervention studies - Clinical trials (drug efficacy) - Training program evaluation - Technology-enhanced learning
Solomon Four-Group Design
structure: notation: | R O₁ X O₂ R O₃ — O₄ R — X O₅ R — — O₆ components: - Group 1: Pretest, Treatment, Posttest - Group 2: Pretest, Control, Posttest - Group 3: No Pretest, Treatment, Posttest - Group 4: No Pretest, Control, Posttest strengths: - Controls testing effects - Allows estimation of pretest sensitization - Comprehensive validity assessment weaknesses: - Requires 4 groups (large sample) - Complex analysis and interpretation - Costly and time-consuming - Logistically challenging when_to_use: - Testing effects suspected - Pretest may interact with treatment - Sufficient resources for 4 groups typical_applications: - Attitude change research - Knowledge assessment where pretest may teach - High-stakes intervention studies
Factorial Design
structure: examples: - "2×2: Two IVs, each with 2 levels (4 groups)" - "3×2: First IV with 3 levels, second IV with 2 levels (6 groups)" - "2×2×2: Three IVs, each with 2 levels (8 groups)" strengths: - Test multiple IVs simultaneously (efficiency) - Detect interaction effects - More realistic (multiple factors) - Statistical power advantage weaknesses: - Complexity increases with factors - Difficult interpretation with 3+ way interactions - Large sample size needed - Main effects confounded if interactions present when_to_use: - Multiple factors of interest - Interaction effects theoretically important - Sufficient sample size available typical_applications: - Teaching method × Student ability - Technology type × Instructional design - Gender × Age interactions
Quasi-Experimental Designs
Nonequivalent Control Group Design
structure: notation: "O₁ X O₂ / O₃ — O₄" components: - No random assignment (intact groups) - Both groups pretested and posttested - Treatment group receives intervention strengths: - Practical in field settings - Retains some causal inference - Pretest allows baseline comparison - Less disruptive than randomization weaknesses: - Selection bias threat - Regression to the mean - Differential maturation possible - Cannot fully equate groups when_to_use: - Randomization impossible/unethical - Intact groups available (classrooms, organizations) - Field-based research typical_applications: - Classroom-based studies (intact classes) - Organization-level interventions - Community programs control_strategies: - Match groups on key variables - Use ANCOVA to control pretest differences - Propensity score matching - Difference-in-differences analysis
Interrupted Time Series (ITS)
structure: notation: "O₁ O₂ O₃ O₄ X O₅ O₆ O₇ O₈" components: - Multiple observations before intervention - Intervention introduced at known time point - Multiple observations after intervention - Can add control group (non-equivalent comparison series) strengths: - Controls history and maturation (within-subject design) - Visual trend analysis - No comparison group needed - Useful for policy evaluation weaknesses: - Requires long data collection period - Seasonal/cyclical effects - Cannot control contemporaneous events - Statistical assumptions (autocorrelation) when_to_use: - Policy/program implemented at specific time - Archival data available - Control group unavailable - Long-term effects of interest typical_applications: - Policy impact evaluation - Curriculum change effects - Technology adoption studies - Public health interventions analysis_methods: - Segmented regression - ARIMA models - Visual analysis of level and slope changes
Regression Discontinuity (RD)
structure: components: - Assignment based on cutoff score - Units above cutoff receive treatment - Units below cutoff do not - Comparison at discontinuity point strengths: - Strong causal inference (quasi-experimental gold standard) - Ethical (assign based on need/merit) - Transparent assignment rule - Local treatment effect well-identified weaknesses: - Requires large sample size (especially near cutoff) - Limited generalization (only at cutoff) - Sensitive to functional form misspecification - Cannot estimate average treatment effect when_to_use: - Assignment rule involves cutoff - Random assignment unethical/infeasible - Sufficient observations near cutoff typical_applications: - Scholarship eligibility (test score cutoff) - Remedial program assignment - Grade promotion policies - Merit-based program evaluation design_considerations: - Ensure sufficient bandwidth around cutoff - Check for manipulation of assignment variable - Test sensitivity to functional form - Plot raw data to visualize discontinuity
2. Survey Designs
Cross-Sectional Survey
structure: components: - Single time point data collection - Representative or convenience sample - Measure multiple variables simultaneously strengths: - Cost-effective and efficient - Large sample sizes feasible - Wide population coverage - Snapshot of current state weaknesses: - Cannot establish temporal precedence - Limited causal inference - Common method bias - Response rate issues when_to_use: - Describe population characteristics - Explore variable relationships - Hypothesis generation - Limited time/resources typical_applications: - Public opinion surveys - Needs assessment - Correlational research - Market research
Longitudinal Panel Study
structure: components: - Same participants measured repeatedly - Multiple waves (2+ time points) - Track individual change strengths: - Individual change trajectories - Temporal precedence established - Within-person comparisons - Stronger causal inference than cross-sectional weaknesses: - Attrition threatens validity - Long duration and cost - Practice effects - Cohort effects confounded with age when_to_use: - Individual development/change - Causal relationships over time - Predictive models typical_applications: - Career development studies - Academic achievement trajectories - Health behavior change - Technology adoption over time attrition_mitigation: - Incentives for continued participation - Multiple contact methods - Intention-to-treat analysis - Attrition analysis (MCAR, MAR, MNAR)
Survey Experiments
vignette_studies: description: "Embedded experiments in surveys using hypothetical scenarios" structure: - Participants randomly assigned to vignette conditions - Vignette attributes manipulated - Measure responses to scenarios strengths: - Causal inference + generalizability - Control over stimuli - Large samples (online surveys) weaknesses: - Hypothetical scenarios (external validity) - Social desirability bias - Cognitive burden conjoint_analysis: description: "Choice experiments with multiple attributes" structure: - Participants evaluate profiles with varying attributes - Estimate attribute importance - Forced choice or rating tasks strengths: - Realistic decision contexts - Interaction effects - Policy simulations weaknesses: - Complex design and analysis - Assumes compensatory decision-making - Interpretation challenges
3. Power Analysis
Power Analysis Tools
g_power: platform: "Windows, Mac, Linux (GUI)" cost: "Free" features: - Visual interface - 25+ statistical tests - Graphical power curves - Sensitivity analysis usage: "Most user-friendly for beginners" pwr_package_r: platform: "R" cost: "Free" features: - Programmatic power analysis - Reproducible scripts - Integration with R workflow functions: - "pwr.t.test() - t-tests" - "pwr.anova.test() - ANOVA" - "pwr.r.test() - Correlation" - "pwr.chisq.test() - Chi-square" usage: "For R users, reproducible research" statsmodels_python: platform: "Python" cost: "Free" module: "statsmodels.stats.power" features: - Python-based power analysis - Integrates with pandas/numpy classes: - "TTestIndPower - Independent t-test" - "FTestAnovaPower - ANOVA" - "NormalIndPower - z-test" usage: "For Python users, data science workflows"
Effect Size Conventions
cohens_d: small: 0.2 medium: 0.5 large: 0.8 interpretation: "Standardized mean difference (t-tests)" formula: "(M₁ - M₂) / SD_pooled" cohens_f: small: 0.10 medium: 0.25 large: 0.40 interpretation: "Effect size for ANOVA" relation_to_eta_squared: "f = √(η² / (1 - η²))" eta_squared: small: 0.01 medium: 0.06 large: 0.14 interpretation: "Proportion of variance explained" note: "η² = SS_effect / SS_total" correlation_r: small: 0.10 medium: 0.30 large: 0.50 interpretation: "Strength of linear relationship" odds_ratio: small: 1.5 medium: 2.5 large: 4.0 interpretation: "Ratio of odds (logistic regression)"
Sample Size Examples
independent_t_test: effect_size: "d = 0.5 (medium)" alpha: 0.05 power: 0.80 tails: "two-tailed" sample_size_per_group: 64 total_sample_size: 128 one_way_anova_3_groups: effect_size: "f = 0.25 (medium)" alpha: 0.05 power: 0.80 number_of_groups: 3 total_sample_size: 159 correlation: effect_size: "r = 0.30 (medium)" alpha: 0.05 power: 0.80 tails: "two-tailed" sample_size: 84 multiple_regression_4_predictors: effect_size: "f² = 0.15 (medium)" alpha: 0.05 power: 0.80 number_of_predictors: 4 sample_size: 85
References
- VS Engine v3.0:
../../research-coordinator/core/vs-engine.md - Dynamic T-Score:
../../research-coordinator/core/t-score-dynamic.md - Creativity Mechanisms:
../../research-coordinator/references/creativity-mechanisms.md - Project State v4.0:
../../research-coordinator/core/project-state.md - Pipeline Templates v4.0:
../../research-coordinator/core/pipeline-templates.md - Integration Hub v4.0:
../../research-coordinator/core/integration-hub.md - Guided Wizard v4.0:
../../research-coordinator/core/guided-wizard.md - Auto-Documentation v4.0:
../../research-coordinator/core/auto-documentation.md - Shadish, Cook, & Campbell (2002). Experimental and Quasi-Experimental Designs
- Creswell & Creswell (2018). Research Design
- Dillman et al. (2014). Internet, Phone, Mail, and Mixed-Mode Surveys