Awesome-Agent-Skills-for-Empirical-Research questionnaire-design-guide
Questionnaire and survey design with Likert scales and coding
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/analysis/wrangling/questionnaire-design-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-questionnaire-des && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/analysis/wrangling/questionnaire-design-guide/SKILL.mdsource content
Questionnaire Design Guide
Design valid and reliable survey instruments with proper question types, Likert scale construction, response coding, and data preparation for analysis.
Survey Design Principles
Question Types
| Type | Example | Best For | Analysis |
|---|---|---|---|
| Likert scale | "Rate your agreement: 1-5" | Attitudes, perceptions | Ordinal/interval statistics |
| Multiple choice | "Select your field" | Demographics, categories | Frequencies, chi-square |
| Ranking | "Rank these 5 options" | Preferences, priorities | Rank correlations |
| Open-ended | "Describe your experience" | Exploratory, rich data | Qualitative coding |
| Matrix/grid | Multiple items, same scale | Efficient battery of items | Factor analysis, reliability |
| Slider/VAS | 0-100 visual analog scale | Continuous measures | Parametric statistics |
| Semantic differential | "Easy __ __ __ __ __ Difficult" | Bipolar attitudes | Factor analysis |
The Four C's of Good Questions
- Clear: Avoid jargon, double-barreled questions, and ambiguity
- Concise: Keep questions short (ideally under 20 words)
- Complete: Include all relevant response options
- Consistent: Use the same scale direction and format throughout
Likert Scale Design
Scale Points
| Points | Scale Example | Recommended Use |
|---|---|---|
| 4-point | Strongly Disagree to Strongly Agree | Forces choice (no neutral), less discriminating |
| 5-point | SD, D, Neutral, A, SA | Most common, good balance of simplicity and discrimination |
| 7-point | SD, D, Somewhat D, Neutral, Somewhat A, A, SA | More discriminating, better for experienced respondents |
| 11-point (0-10) | Not at all to Completely | NPS, continuous-like measures |
Anchoring Labels
5-Point Agreement Scale: 1 = Strongly Disagree 2 = Disagree 3 = Neither Agree nor Disagree 4 = Agree 5 = Strongly Agree 5-Point Frequency Scale: 1 = Never 2 = Rarely 3 = Sometimes 4 = Often 5 = Always 5-Point Satisfaction Scale: 1 = Very Dissatisfied 2 = Dissatisfied 3 = Neutral 4 = Satisfied 5 = Very Satisfied
Reverse-Coded Items
Include 2-3 reverse-coded items per construct to detect acquiescence bias:
Regular: "I find research methods interesting." (1-5: SD to SA) Reversed: "I find research methods tedious and dull." (1-5: SD to SA) # Recode reversed items before analysis: # reversed_score = (max_scale + 1) - raw_score # For a 5-point scale: reversed_score = 6 - raw_score
Constructing a Multi-Item Scale
Step-by-Step Process
- Define the construct: Write a clear conceptual definition
- Generate items: Write 1.5-2x the number of items you plan to keep (e.g., write 15 items for an 8-item scale)
- Expert review: Have 3-5 experts rate each item for relevance (Content Validity Index)
- Pilot test: Administer to 30-50 respondents
- Item analysis: Calculate item-total correlations, check reliability
- Exploratory Factor Analysis (EFA): Confirm dimensionality
- Finalize scale: Remove weak items, re-test reliability
Example: Research Self-Efficacy Scale
Construct: Belief in one's ability to conduct academic research Items (5-point Likert, Strongly Disagree to Strongly Agree): RSE1: I can formulate clear research questions. RSE2: I can design an appropriate research methodology. RSE3: I can analyze data using statistical software. RSE4: I can write a publishable research paper. RSE5: I can critically evaluate published research. RSE6: I can present research findings at a conference. RSE7R: I struggle to interpret statistical results. [REVERSED] RSE8R: I find it difficult to synthesize literature. [REVERSED]
Data Coding and Preparation
Coding Scheme
import pandas as pd import numpy as np # Define coding scheme likert_coding = { "Strongly Disagree": 1, "Disagree": 2, "Neither Agree nor Disagree": 3, "Agree": 4, "Strongly Agree": 5 } # Apply coding df["Q1_coded"] = df["Q1_raw"].map(likert_coding) # Reverse code specific items reverse_items = ["RSE7R", "RSE8R"] max_scale = 5 for item in reverse_items: df[f"{item}_recoded"] = (max_scale + 1) - df[item] # Calculate composite score (mean of items) scale_items = ["RSE1", "RSE2", "RSE3", "RSE4", "RSE5", "RSE6", "RSE7R_recoded", "RSE8R_recoded"] df["RSE_mean"] = df[scale_items].mean(axis=1)
Missing Data Handling
# Check missing data patterns print(df[scale_items].isnull().sum()) print(f"Complete cases: {df[scale_items].dropna().shape[0]} / {df.shape[0]}") # Common strategies: # 1. Listwise deletion (if < 5% missing) df_complete = df.dropna(subset=scale_items) # 2. Mean imputation per item (simple but biased) df[scale_items] = df[scale_items].fillna(df[scale_items].mean()) # 3. Person-mean imputation (if < 20% of items missing per person) def person_mean_impute(row, items, max_missing=2): if row[items].isnull().sum() <= max_missing: return row[items].fillna(row[items].mean()) return row[items] # leave as NaN if too many missing df[scale_items] = df.apply(lambda r: person_mean_impute(r, scale_items), axis=1)
Reliability Analysis
Cronbach's Alpha
import pingouin as pg # Calculate Cronbach's alpha alpha = pg.cronbach_alpha(df[scale_items]) print(f"Cronbach's alpha: {alpha[0]:.3f}") # Interpretation: >= 0.70 acceptable, >= 0.80 good, >= 0.90 excellent
library(psych) # Cronbach's alpha with item-level diagnostics alpha_result <- alpha(data[, scale_items]) print(alpha_result) # Check "raw_alpha if item dropped" to identify weak items
Item-Total Correlations
# Corrected item-total correlations (should be > 0.30) item_stats <- alpha_result$item.stats print(item_stats[, c("r.drop", "raw.alpha")]) # r.drop < 0.30: consider removing the item # raw.alpha increases if dropped: item is weakening the scale
Validity Assessment
| Validity Type | Method | Criterion |
|---|---|---|
| Content validity | Expert panel rating (CVI) | I-CVI >= 0.78, S-CVI/Ave >= 0.90 |
| Construct validity | Exploratory Factor Analysis (EFA) | Eigenvalue > 1, loadings > 0.40 |
| Convergent validity | Correlation with related construct | r > 0.30 |
| Discriminant validity | Correlation with unrelated construct | r < 0.30 |
| Criterion validity | Correlation with external criterion | Significant correlation |
| Test-retest reliability | ICC or Pearson r over 2-4 weeks | ICC > 0.70 |
Common Design Mistakes
| Mistake | Example | Fix |
|---|---|---|
| Double-barreled question | "This course is interesting and useful" | Split into two separate items |
| Leading question | "Don't you agree that X is important?" | "How important is X to you?" |
| Absolute terms | "Do you always check citations?" | "How often do you check citations?" |
| Missing option | No "Not Applicable" when needed | Add N/A option or filter logic |
| Inconsistent scale direction | Some items 1=good, others 1=bad | Standardize direction; clearly mark reversed items |
| Too many items | 100-item survey | Aim for 5-8 items per construct, 15-30 min total |
| No pilot test | Skip straight to full deployment | Always pilot with 30-50 respondents |
Survey Platform Comparison
| Platform | Cost | Features | Best For |
|---|---|---|---|
| Qualtrics | Institutional | Advanced logic, panels, API | Large academic studies |
| SurveyMonkey | Freemium | Easy to use, basic analysis | Quick surveys |
| Google Forms | Free | Simple, integrates with Sheets | Classroom, pilot testing |
| LimeSurvey | Free/self-hosted | Open source, full control | Privacy-sensitive research |
| REDCap | Free (academic) | Clinical data, HIPAA compliant | Medical/clinical research |
| Prolific | Per-response | Participant recruitment | Online experiments |