install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/1kalin/afrexai-interview-architect" ~/.claude/skills/clawdbot-skills-afrexai-interview-architect && rm -rf "$T"
manifest:
skills/1kalin/afrexai-interview-architect/SKILL.mdsource content
Interview Architect
Complete hiring interview system — from job scorecard design through structured question banks, live evaluation rubrics, panel coordination, and offer decisions. Eliminates gut-feel hiring with evidence-based frameworks that predict on-the-job performance.
Quick Start
Tell me what you need:
- "Design interviews for [role]" → Full interview plan (scorecard + questions + rubrics)
- "Create a scorecard for [role]" → A-Player definition with measurable outcomes
- "Generate questions for [skill/competency]" → Targeted question bank
- "Build a take-home assignment for [role]" → Technical assessment with rubric
- "Evaluate this candidate" → Structured debrief with scoring
- "Audit our interview process" → Bias check + effectiveness review
Phase 1: Job Scorecard (Define Before You Evaluate)
Rule: Never look at a resume before defining what "great" looks like.
Scorecard Template
scorecard: role: "[Title]" level: "[Junior/Mid/Senior/Staff/Principal/Director/VP]" team: "[Team name]" hiring_manager: "[Name]" created: "YYYY-MM-DD" mission: statement: "[One sentence: why does this role exist?]" success_metric: "[How we'll know this hire was successful in 12 months]" outcomes: # 3-5 specific, measurable results expected in first 12 months - outcome: "[e.g., Reduce deployment time from 45min to <10min]" measure: "[Metric: deployment duration, measured via CI/CD logs]" timeline: "Q1-Q2" priority: "critical" - outcome: "[e.g., Ship v2 API with 99.9% uptime]" measure: "[Uptime %, error rate, customer adoption]" timeline: "Q2-Q3" priority: "critical" - outcome: "[e.g., Mentor 2 junior engineers to mid-level competency]" measure: "[Promotion readiness assessment, PR quality metrics]" timeline: "Q3-Q4" priority: "important" competencies: technical: must_have: - name: "[e.g., System design]" level: "[Novice/Competent/Proficient/Expert]" evidence: "[What demonstrates this: e.g., designed systems handling 10K+ RPS]" - name: "[e.g., TypeScript/React]" level: "Proficient" evidence: "[Shipped production TS/React apps, not just tutorials]" nice_to_have: - name: "[e.g., Kubernetes]" level: "Competent" behavioral: must_have: - name: "Ownership" definition: "Takes responsibility for outcomes, not just tasks. Doesn't wait to be told." anti_pattern: "Says 'that's not my job' or 'I was told to do X'" - name: "Communication" definition: "Explains complex ideas simply. Writes clear docs. Raises issues early." anti_pattern: "Surprises stakeholders. Can't explain their own work." - name: "Growth mindset" definition: "Seeks feedback. Admits mistakes. Improves from failure." anti_pattern: "Defensive about criticism. Repeats same mistakes." nice_to_have: - name: "[e.g., Cross-functional leadership]" cultural: values_alignment: - "[Company value 1: what this looks like in practice]" - "[Company value 2: what this looks like in practice]" anti_signals: - "[Red flag behavior 1]" - "[Red flag behavior 2]" compensation: band: "[min - max]" equity: "[range if applicable]" flexibility: "[What's negotiable]" deal_breakers: # Hard no's — instant disqualification - "[e.g., Cannot start within 4 weeks]" - "[e.g., No experience with production systems at scale]" - "[e.g., Requires >30% above band]"
Scorecard Quality Check
Before proceeding, verify:
- Mission statement is one sentence (not a paragraph)
- Each outcome has a specific number or metric (not "improve" or "help with")
- Competencies distinguish must-have from nice-to-have
- Anti-patterns defined for each behavioral competency
- Deal breakers are objective (not subjective feelings)
- Band is realistic for the market (check levels.fyi, Glassdoor)
Phase 2: Interview Structure Design
Interview Loop Template
interview_loop: role: "[from scorecard]" total_duration: "[X hours across Y sessions]" stages: - stage: "Resume Screen" duration: "5-10 min" who: "Recruiter or hiring manager" evaluates: ["deal_breakers", "basic_qualification"] pass_rate_target: "30-40%" - stage: "Phone Screen" duration: "30 min" who: "Hiring manager" evaluates: ["communication", "motivation", "outcome_1_capability"] format: "Structured conversation" pass_rate_target: "50%" - stage: "Technical Assessment" duration: "60-90 min" who: "Senior engineer" evaluates: ["technical_competencies"] format: "Live coding OR take-home (see Phase 4)" pass_rate_target: "40-50%" - stage: "System Design" duration: "45-60 min" who: "Staff+ engineer" evaluates: ["system_design", "trade_off_thinking", "communication"] format: "Whiteboard/collaborative design" pass_rate_target: "50%" applies_to: "Senior+ only" - stage: "Behavioral Deep-Dive" duration: "45-60 min" who: "Hiring manager + cross-functional partner" evaluates: ["behavioral_competencies", "cultural_values"] format: "STAR-based structured interview" pass_rate_target: "60%" - stage: "Team Fit / Reverse Interview" duration: "30 min" who: "2-3 potential teammates" evaluates: ["collaboration_style", "candidate_questions"] format: "Informal but structured" pass_rate_target: "80%" - stage: "Hiring Manager Final" duration: "30 min" who: "Hiring manager" evaluates: ["remaining_concerns", "motivation", "offer_readiness"] format: "Conversation" timeline: screen_to_onsite: "< 5 business days" onsite_to_decision: "< 2 business days" decision_to_offer: "< 1 business day" total_process: "< 3 weeks"
Level-Appropriate Loop Adjustments
| Level | Skip | Add | Emphasis |
|---|---|---|---|
| Junior (0-2 yr) | System design | Pair programming, learning ability | Potential > experience |
| Mid (2-5 yr) | — | — | Balanced: execution + growth |
| Senior (5-8 yr) | — | Architecture discussion | Impact, ownership, mentoring |
| Staff (8+ yr) | Basic coding | Design doc review, strategy | Influence, technical vision |
| Principal | Basic coding | Vision presentation, exec interview | Org-wide impact |
| Manager | Live coding | Skip-level, cross-functional | People outcomes, strategy |
| Director+ | All IC technical | Board/exec presentation | Business impact, org building |
Phase 3: Question Banks
Behavioral Questions (STAR Format)
For each question below:
- Ask the main question
- Then probe with: "Walk me through specifically what YOU did" (not the team)
- Then probe with: "What was the measurable result?"
- Watch for: vague answers, "we" without "I", unable to recall specifics
Ownership & Initiative
Q: "Tell me about a time you identified a problem no one asked you to fix, and you fixed it anyway." Probe: "How did you discover it? What did you do first? What was the outcome?" Green signal: Specific problem, proactive action, measurable impact Red flag: Can't recall an example, or problem was trivial Q: "Describe a project that failed or didn't meet expectations. What was your role?" Probe: "What would you do differently? What did you learn?" Green signal: Owns their part, specific lessons, changed behavior afterward Red flag: Blames others, no learning, defensive Q: "Tell me about the last time you disagreed with your manager's technical decision." Probe: "How did you raise it? What happened? Would you do it differently?" Green signal: Respectful pushback with data, compromise or acceptance Red flag: Never disagrees, or went around manager, or still bitter
Communication & Collaboration
Q: "Describe the most complex technical concept you had to explain to a non-technical audience." Probe: "How did you know they understood? What would you change?" Green signal: Adapts language, checks understanding, uses analogies Red flag: Talks down, uses jargon anyway, frustrated by the need Q: "Tell me about a cross-team project that had conflicting priorities." Probe: "How did you align the teams? What trade-offs were made?" Green signal: Proactive communication, documented agreements, escalated appropriately Red flag: Waited for someone else to resolve, or steamrolled Q: "Give me an example of written communication that had significant impact." Probe: "What was the context? Who was the audience? What resulted?" Green signal: Design doc, RFC, post-mortem that changed decisions Red flag: Can't think of one, or only Slack messages
Technical Excellence
Q: "What's the best piece of code or system you've built? Walk me through it." Probe: "What trade-offs did you make? What would you change now?" Green signal: Deep understanding, clear trade-off reasoning, honest about flaws Red flag: Can't go deep, no awareness of trade-offs Q: "Tell me about a production incident you were involved in resolving." Probe: "How did you diagnose it? What was root cause? What prevented recurrence?" Green signal: Systematic debugging, root cause fix (not band-aid), prevention measures Red flag: Only applied quick fix, blamed infrastructure, no follow-up Q: "Describe a time you had to make a technical decision with incomplete information." Probe: "What did you know? What didn't you know? How did you decide?" Green signal: Explicit about unknowns, gathered what they could, made reversible decision Red flag: Paralyzed, or overconfident without data
Leadership & Mentoring (Senior+)
Q: "Tell me about someone you helped grow significantly in their career." Probe: "What did you specifically do? How did you know it was working?" Green signal: Specific actions (pair programming, stretch assignments, feedback), measurable growth Red flag: "I told them what to do" or can't name anyone Q: "Describe a technical strategy or vision you set for your team." Probe: "How did you get buy-in? How did you measure progress?" Green signal: Clear rationale, stakeholder alignment, adapted based on feedback Red flag: Top-down mandate, or never set direction Q: "Tell me about a time you had to say no to a stakeholder or product request." Probe: "How did you explain it? What was the outcome?" Green signal: Data-driven reasoning, offered alternatives, maintained relationship Red flag: Just said no, or always says yes
Forensic Resume Questions (Pressure Tests)
For each resume highlight, design verification questions:
Pattern: "[Impressive claim on resume]" → "Walk me through [specific project]. What was the state when you joined?" → "What was YOUR specific contribution vs the team's?" → "What was the hardest technical problem YOU solved?" → "If I called your manager from that time, what would they say was your biggest weakness?" Pattern: "Led team of X" → "How many people reported to you directly?" → "Name someone you had to give tough feedback to. What happened?" → "Who was the weakest performer? What did you do about it?" Pattern: "Improved X by Y%" → "What was the baseline measurement? How did you measure it?" → "What was it before you started? After? How long did it take?" → "What else changed during that period that could explain the improvement?" Pattern: "Short tenure (< 1 year)" → "Walk me through your decision to leave [company]." → "What would your manager there say about your departure?" → "What did you learn from that experience?" Pattern: "Gap in employment" → Ask once, move on. Don't dwell. Valid reasons: health, family, travel, learning, job market. → Red flag only if: story keeps changing, or they're evasive about a very long gap
Future Simulation Questions (Performance Prediction)
Design scenario questions based on the actual role's outcomes:
Template: "In this role, one of your first challenges will be [outcome from scorecard]. The current situation is [honest context]. Walk me through how you'd approach this in your first [timeframe]." Example (Senior Backend): "Our API currently handles 2K RPS but we need to scale to 50K by Q3. The codebase is a 3-year-old Node.js monolith with PostgreSQL. Budget for infrastructure is $10K/mo. Team is 4 engineers including you. How would you approach this?" Probe sequence: 1. "What would you do in week 1?" (Information gathering) 2. "What data would you need?" (Analytical thinking) 3. "What are the biggest risks?" (Risk awareness) 4. "If [constraint changes], how does your approach change?" (Adaptability) 5. "How would you communicate progress to stakeholders?" (Communication) Scoring: 5 — Structured approach, asks clarifying questions, identifies trade-offs, realistic timeline 4 — Good approach with minor gaps 3 — Reasonable but generic, doesn't probe assumptions 2 — Jumps to solution without understanding problem 1 — No coherent approach, or unrealistic
Phase 4: Technical Assessments
Live Coding Assessment Design
coding_assessment: duration: "60 min" structure: warm_up: "5 min — environment setup, introduce the problem" problem_1: "20 min — core implementation" problem_2: "25 min — extension or new problem" debrief: "10 min — trade-offs discussion" problem_design_rules: - Solvable in the time limit (test it yourself first — halve your time) - Multiple valid approaches (no single "right answer") - Extension points for stronger candidates - Relevant to actual work (not algorithm puzzles unless role requires it) - Candidate chooses their language - Provide starter code / boilerplate to reduce setup time evaluation_rubric: problem_solving: 5: "Breaks down problem, considers edge cases upfront, efficient approach" 3: "Gets to solution but misses edge cases or takes indirect path" 1: "Struggles to break down problem, no clear approach" code_quality: 5: "Clean, readable, well-named, handles errors, testable" 3: "Works but messy, some error handling, reasonable naming" 1: "Barely works, no error handling, unclear naming" communication: 5: "Thinks aloud, explains trade-offs, asks clarifying questions" 3: "Some explanation, responds to prompts" 1: "Silent, defensive about suggestions, doesn't explain reasoning" testing_awareness: 5: "Writes tests unprompted, considers edge cases, talks about test strategy" 3: "Writes tests when prompted, covers happy path" 1: "No testing consideration" speed_and_fluency: 5: "Fast, clearly experienced, language/tooling fluent" 3: "Reasonable pace, occasional lookups" 1: "Very slow, struggles with syntax/tooling" do_not: - Ask trick questions or gotchas - Time pressure beyond reasonable - Penalize for looking things up - Judge IDE/editor choice - Ask questions that require proprietary knowledge
Take-Home Assessment Design
take_home: time_limit: "3-4 hours (honor system, state clearly)" deadline: "5-7 days from send" problem_design: - Real-world scenario (not academic) - Clear requirements with defined scope - Extension section for candidates who want to show more - Starter repo with CI, linting, test framework pre-configured deliverables: required: - Working solution - Tests (at minimum: happy path + 2 edge cases) - README explaining approach, trade-offs, what you'd improve optional: - Architecture diagram - Performance analysis - Additional features from extension section evaluation_rubric: functionality: "30% — Does it work? Edge cases handled?" code_quality: "25% — Clean, readable, maintainable, well-structured" testing: "20% — Coverage, meaningful tests, edge cases" documentation: "15% — README quality, trade-off explanations" extras: "10% — Extension features, thoughtful additions" anti_gaming: - Check git history (single mega-commit = suspicious) - Ask about implementation details in follow-up interview - Vary the problem slightly across candidates - Time the follow-up discussion: over-engineered solutions + can't explain = red flag
System Design Assessment (Senior+)
system_design: duration: "45-60 min" structure: requirements: "10 min — clarify scope, constraints, scale" high_level: "15 min — components, data flow, API design" deep_dive: "15 min — pick 1-2 areas to go deep" trade_offs: "10 min — discuss alternatives, failure modes" extensions: "5 min — how would this evolve?" evaluation: requirements_gathering: 5: "Asks about scale, users, latency requirements, budget before designing" 3: "Some clarifying questions but misses key constraints" 1: "Jumps straight to drawing boxes" high_level_design: 5: "Clear components with well-defined boundaries, data flows make sense" 3: "Reasonable architecture but some unclear responsibilities" 1: "Vague boxes with arrows, can't explain data flow" depth: 5: "Deep knowledge in chosen area, considers failure modes, cites real experience" 3: "Good knowledge but stays surface level" 1: "Can't go deep on any component" trade_off_awareness: 5: "Explicitly names trade-offs, compares alternatives, knows when each fits" 3: "Acknowledges trade-offs when prompted" 1: "Presents one approach as the only option" scalability: 5: "Considers growth path, bottleneck identification, realistic scaling strategy" 3: "Basic scaling awareness" 1: "No consideration of scale or unrealistic assumptions"
Phase 5: Evaluation & Decision
Per-Interviewer Scorecard
interviewer_scorecard: candidate: "[name]" interviewer: "[name]" stage: "[which interview]" date: "YYYY-MM-DD" # Score BEFORE reading other interviewers' feedback overall: 1-5 # 1=Strong No, 2=Lean No, 3=Neutral, 4=Lean Yes, 5=Strong Yes competency_scores: - competency: "[from scorecard]" score: 1-5 evidence: "[Specific quote or behavior observed]" - competency: "[from scorecard]" score: 1-5 evidence: "[Specific quote or behavior observed]" green_signals: - "[Specific positive indicator with evidence]" red_flags: - "[Specific concern with evidence]" questions_for_next_interviewer: - "[What to probe further]" # IMPORTANT: Submit before debrief. Do not change after discussion.
Debrief Protocol
1. BEFORE debrief: - All interviewers submit scorecards independently - Hiring manager collects but does NOT share scores 2. DEBRIEF structure (30-45 min): a. Each interviewer states their overall vote FIRST (no explanation yet) → This prevents anchoring bias from persuasive speakers b. Lowest scorer goes first (explain concerns) → Prevents positive bias from drowning out concerns c. Highest scorer responds d. Open discussion — focus on EVIDENCE not feelings → "They seemed smart" is not evidence → "They designed a cache invalidation strategy that handled..." IS evidence e. Address conflicting signals: → If strong yes + strong no on same competency, that's the discussion → Resolve with: "What specific behavior did you observe?" f. Final vote (all interviewers): → Strong Hire / Hire / No Hire / Strong No Hire → Any "Strong No Hire" triggers discussion but NOT automatic rejection → Hiring manager makes final call but must document reasoning 3. AFTER debrief: - Decision recorded with reasoning - Feedback compiled for candidate (regardless of outcome) - Action items assigned (offer prep or rejection with feedback)
Scoring Decision Matrix
Strong Hire (all 4-5): → Make offer within 24 hours → Expedite process — strong candidates have multiple offers Hire (mix of 3-5, no 1s): → Make offer within 48 hours → Address any 3-scores with targeted onboarding plan Borderline (mix of 2-4): → Additional data needed — one more focused interview on weak areas → Set a deadline: if still borderline after additional data → No Hire → "When in doubt, don't hire" — the cost of a bad hire > cost of continuing search No Hire (any 1, or multiple 2s): → Decline with specific, constructive feedback → Document clearly for future reference (candidate may reapply) Strong No Hire (multiple 1s or deal breaker): → Immediate decline → Review: did we miss this in screening? Fix the funnel.
Phase 6: Bias Mitigation
Pre-Interview Bias Checks
Before each interview, remind yourself: □ I will evaluate against the SCORECARD, not my "gut feeling" □ I will give the same weight to disconfirming evidence as confirming □ I will not let one great/terrible answer color the entire evaluation □ I will not compare this candidate to the last one — compare to the scorecard □ I will note specific behaviors, not general impressions □ I will not evaluate "culture fit" as "would I have a beer with them"
Common Biases in Hiring
| Bias | What It Looks Like | Mitigation |
|---|---|---|
| Halo effect | Great at coding → assume great at everything | Score each competency independently |
| Horn effect | Weak communication → assume weak technically | Same: score independently |
| Similarity bias | "Reminds me of me" → favorable rating | Evaluate against scorecard, not self |
| Anchoring | First impression sets the tone | Score after all questions, not during |
| Confirmation bias | Early positive → only notice positives | Actively look for counter-evidence |
| Contrast effect | Looks great after a weak candidate | Compare to scorecard, not other candidates |
| Recency bias | Remember last answer, forget first | Take notes during interview |
| Attribution error | Success = skill, failure = circumstances | Probe both: "What went wrong? What helped?" |
| Leniency bias | Avoid conflict, rate everyone 3-4 | Force yourself to use the full 1-5 scale |
| Urgency bias | "We need someone NOW" → lower bar | Never lower scorecard standards — extend timeline instead |
Structured Interview Rules
- Same questions for same role — every candidate gets the same core questions
- Score immediately after — before discussing with anyone
- Evidence-based only — every score needs a specific observation
- Diverse panel — at least one interviewer from a different team/background
- Blind resume screen — remove name, school, company names for initial screen (if possible)
- No leading questions — "You're probably great at X, right?" → "Tell me about your experience with X"
- Time-boxed — same duration for every candidate (don't cut short or extend based on vibes)
Phase 7: Candidate Experience
Communication Templates
After each stage — within 24 hours:
ADVANCING: "Hi [name], thank you for your time today. We enjoyed our conversation about [specific topic]. We'd like to move forward with [next stage]. [Interviewer name] will be speaking with you about [topic]. Available times: [options]. Any questions before then? — [recruiter name]" REJECTION (after phone screen): "Hi [name], thank you for taking the time to speak with us about [role]. After careful consideration, we've decided not to move forward at this stage. [One specific, constructive piece of feedback if appropriate]. We'll keep your information on file and may reach out for future opportunities that align more closely. Wishing you the best in your search. — [name]" REJECTION (after onsite): "Hi [name], thank you for investing [X hours] in our interview process. We were impressed by [specific positive], but ultimately decided to move forward with a candidate whose [specific competency] more closely matches our current needs. Feedback: [1-2 specific, actionable items]. We genuinely appreciated your time and would welcome a future conversation if circumstances change. — [hiring manager name]" OFFER (verbal, then written within 24h): "Hi [name], I'm excited to share that we'd like to offer you the [role] position. We were particularly impressed by [specific evidence from interviews]. Here's what we're proposing: [comp summary]. I'll send the formal offer letter within 24 hours. Do you have any initial questions? — [hiring manager]"
Candidate Experience Scorecard
After every hire (and quarterly for all candidates):
| Dimension | Target | How to Measure |
|---|---|---|
| Time to schedule | < 48h between stages | Track in ATS |
| Interviewer preparedness | 100% read scorecard before | Post-interview survey |
| Communication timeliness | < 24h response | Track in ATS |
| Feedback quality | Specific + actionable | Candidate survey |
| Overall experience | 4+/5 | Candidate survey (all, not just hires) |
| Offer acceptance rate | > 80% | Track in ATS |
Phase 8: Process Audit & Improvement
Quarterly Hiring Review
quarterly_review: period: "Q[N] YYYY" funnel_metrics: applications: N screens_passed: N # → Screen pass rate onsites: N # → Onsite conversion rate offers: N # → Offer rate accepts: N # → Acceptance rate quality_metrics: ninety_day_retention: "X%" manager_satisfaction_90d: "X/5" time_to_productivity: "X weeks" regretted_attrition_1yr: "X%" process_metrics: time_to_fill: "X days (target: <30)" time_in_stage: screen: "X days" onsite: "X days" decision: "X days" offer: "X days" interviewer_calibration: "score variance across interviewers" actions: - "[Improvement 1 based on metrics]" - "[Improvement 2]"
Interview Question Effectiveness Tracking
For each question in your bank, track:
question_effectiveness: question: "[question text]" times_asked: N signal_quality: strong_differentiator: N # Times this question clearly separated strong/weak no_signal: N # Times everyone answered similarly confusing: N # Times candidates misunderstood # If no_signal > 50% → Replace the question # If confusing > 20% → Reword the question # If strong_differentiator > 70% → Keep and promote
Interviewer Calibration
Monthly: Compare interviewer scores across candidates - Interviewer A averages 4.2, Interviewer B averages 2.8 → calibration needed - Run calibration session: review same candidate, discuss scoring differences - Goal: interviewers should be within 0.5 points on average for same candidates Training for new interviewers: 1. Shadow 3 interviews (observe, don't participate) 2. Reverse shadow 2 interviews (conduct, observed by experienced interviewer) 3. Solo with debrief for 3 interviews 4. Full autonomy after calibration check
Edge Cases
Internal Candidates
- Use SAME scorecard as external (fairness)
- Different question strategy: focus on future role, not past (you already know their past)
- If not selected: manager delivers feedback personally, development plan, timeline for re-candidacy
- Never promise the internal candidate gets special treatment
Executive Hiring
- Add: reference checks (5+ structured, including back-channel)
- Add: board/exec team dinner (culture, not evaluation)
- Add: 90-day plan presentation as final stage
- Extended scorecard: strategic thinking, board management, talent magnetism
- Use executive search firm for sourcing, but own evaluation internally
High-Volume Hiring (10+ same role)
- Standardize EVERYTHING: same questions, same rubric, same order
- Use structured scoring sheets, not free-form notes
- Batch calibration sessions weekly
- Consider: group assessment centers for initial stages
- Track: quality variance across hiring managers (should be low)
Remote/Async Interviews
- Test tech setup before the interview (not during)
- Camera on (both sides) — non-verbal cues matter
- Record (with consent) for calibration purposes
- Take-home > live coding for timezone-challenged candidates
- Bias alert: don't penalize for background noise, accent, or non-native English fluency
Boomerang Employees
- Treat as new candidate (things change)
- Skip: basic company knowledge questions
- Focus: why they left, what changed, what they learned outside
- Check: has the team/role changed since they left? Do current team members want them back?
Counteroffers
- If candidate receives counteroffer:
- Don't panic-increase. Your offer should already be fair.
- "We made our best offer based on the value of the role. We'd love to have you, but understand if you decide to stay."
- Statistics: 80% of people who accept counteroffers leave within 18 months anyway
- If they stay: respect it, keep the door open
Natural Language Commands
| Say | I Do |
|---|---|
| "Design interviews for [role]" | Full loop: scorecard + structure + questions + rubrics |
| "Create a scorecard for [role]" | A-Player definition with outcomes and competencies |
| "Generate behavioral questions for [competency]" | STAR questions with probes and scoring |
| "Build a take-home for [role]" | Assessment with rubric and anti-gaming measures |
| "Design a system design interview for [level]" | Structure + evaluation rubric |
| "Evaluate candidate [name]" | Structured debrief template with scoring |
| "Create a phone screen for [role]" | 30-min structured screen with pass/fail criteria |
| "Write rejection feedback for [candidate]" | Specific, constructive rejection message |
| "Audit our interview process" | Full process review with metrics and recommendations |
| "Calibrate interviewers" | Calibration session plan with scoring alignment |
| "Design interview for [role] at [company stage]" | Adjusted for startup/growth/enterprise context |
| "Generate reference check questions for [role]" | Structured reference interview guide |