Gsd-skill-creator retrospective-learning
Retrospective facilitation and organizational learning for project teams. Covers retrospective formats (Start/Stop/Continue, 4Ls, timeline, sailboat), blameless postmortems, lessons learned databases, knowledge management, organizational learning theory (Senge's Fifth Discipline, Argyris's double-loop learning), after-action reviews (military AAR heritage), GSD's session-to-session retrospective pattern, metrics-driven improvement, celebration of success, psychological safety for honest retrospectives, and cross-project pattern recognition.
git clone https://github.com/Tibsfox/gsd-skill-creator
T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/skills/project-management/retrospective-learning" ~/.claude/skills/tibsfox-gsd-skill-creator-retrospective-learning && rm -rf "$T"
examples/skills/project-management/retrospective-learning/SKILL.mdRetrospective Learning
A team that does not reflect does not improve. The retrospective is the mechanism by which experience becomes knowledge — by which "we did something" transforms into "we understand what we did, why it worked or didn't, and what we will do differently." Without retrospectives, teams repeat the same mistakes across projects, sprints, and careers. With retrospectives, each iteration is smarter than the last.
But retrospectives are fragile. They require honesty, and honesty requires safety. A team that fears blame will report only successes. A retrospective that surfaces only successes is not a retrospective — it is a performance. This skill covers both the mechanics (formats, facilitation, data) and the human conditions (psychological safety, blameless culture, celebration) that make retrospectives actually work.
Agent affinity: sinek (pedagogy/leadership), deming (quality/continuous improvement)
Concept IDs: bus-stakeholder-theory, bus-business-ethics, crit-problem-analysis, crit-evidence-evaluation, ps-metacognition
Retrospective Format Catalog
| # | Format | Best for | Duration | Team size |
|---|---|---|---|---|
| 1 | Start/Stop/Continue | General purpose, quick | 30-60 min | Any |
| 2 | 4Ls | Balanced reflection | 45-60 min | 4-10 |
| 3 | Timeline | Complex sprints with many events | 60-90 min | 5-12 |
| 4 | Sailboat | Visual, metaphor-driven | 45-60 min | 4-10 |
| 5 | Starfish | Fine-grained action | 45-60 min | 4-10 |
| 6 | Mad/Sad/Glad | Emotional check-in | 30-45 min | Any |
Rotation principle. Use different formats across sprints. The same format every time leads to mechanical responses. Novelty keeps thinking fresh.
Format 1 — Start/Stop/Continue
The simplest and most widely used format. Three columns:
- Start: Things we should begin doing
- Stop: Things we should cease doing
- Continue: Things that are working and should persist
Facilitation Guide
- Silent brainstorming (5 min). Each person writes items on sticky notes (one idea per note). No discussion during writing.
- Share and cluster (10 min). Post notes on the board. Group related items. Read aloud for remote participants.
- Dot vote (3 min). Each person gets 3-5 dots. Vote on the items you think are most important.
- Discuss top items (20-30 min). Take the top 3-5 voted items. For each: What specifically happened? What should we do about it? Who will own it? By when?
- Capture actions (5 min). Write down committed actions with owners and deadlines.
Worked Example — Sprint 7 Retrospective
Start:
- Start writing acceptance criteria before sprint planning (3 votes)
- Start pairing on complex database migrations (5 votes)
- Start tagging Slack messages by project context
Stop:
- Stop attending the all-hands product meeting (not relevant to our sprint work) (4 votes)
- Stop deploying on Fridays (2 incidents caused by Friday deploys) (7 votes)
- Stop accepting stories without acceptance criteria (6 votes)
Continue:
- Continue daily standup format (keep it to 15 min) (3 votes)
- Continue PR size limit of 400 lines (2 votes)
- Continue weekly tech debt allocation (4 votes)
Top actions:
- No Friday deploys — effective immediately. Owner: Scrum Master. (7 votes)
- Stories require acceptance criteria before entering sprint backlog. Product Owner to enforce. (6 votes)
- Pair programming for database migrations. Schedule pairing sessions on migration days. Owner: Dev Lead. (5 votes)
Format 2 — 4Ls (Liked, Learned, Lacked, Longed For)
- Liked: What did we enjoy? What went well?
- Learned: What new knowledge did we gain?
- Lacked: What was missing? What held us back?
- Longed For: What do we wish we had?
Worked Example
Liked:
- The spike on the caching layer — gave us confidence before committing to the approach
- Fast PR reviews this sprint (under 4 hours average)
Learned:
- Redis cluster mode requires different connection handling than standalone
- Our staging environment does not accurately reflect production memory limits
Lacked:
- Clear documentation on the payment API's retry behavior
- Access to production logs during the outage investigation (permissions issue)
Longed For:
- Automated performance tests in CI (currently manual)
- A dedicated design resource for the dashboard redesign
4Ls advantage. The "Learned" column captures knowledge that Start/Stop/Continue misses. The "Longed For" column captures aspirations without implying blame.
Format 3 — Timeline Retrospective
For sprints with many events, the timeline format reconstructs the sprint chronologically:
- Draw the timeline. X-axis is time (day by day through the sprint). Y-axis is mood (high/low).
- Mark events. Each person adds events to the timeline: decisions, deployments, blockers, breakthroughs, surprises.
- Plot the mood curve. The team collectively draws a line showing morale over the sprint.
- Discuss inflection points. Where did mood drop? Where did it rise? What caused the changes?
Worked Example — Sprint 9 Timeline
Mood ^ | * * | / \ * / \ | / \ / \ / \ * |/ * * \ * / \ | \ / \ / *-- | \/ * +---+---+---+---+---+---+---+---+---+----> Day 1 2 3 4 5 6 7 8 9 10 Events: Day 1: Sprint planning went well (high) Day 3: Found a critical bug in auth service (low) Day 4: Root cause identified, fix deployed (recovery) Day 5: Client demo, positive feedback (high) Day 6: Surprise scope change from product (low) Day 7-8: Team absorbed scope change, rallied (gradual recovery) Day 9: Production incident — unrelated legacy system (low) Day 10: Incident resolved, sprint goals met (recovery)
Discussion focuses on the dips: Day 3 (bug) — why wasn't it caught by tests? Day 6 (scope change) — what's the process for mid-sprint scope injection? Day 9 (incident) — why is legacy system health our problem?
Timeline advantage. Reveals patterns invisible in other formats: recurring Tuesday blockers, post-deploy dips, demo-day spikes. The visual triggers memories that a prompt-based format does not.
Format 4 — Sailboat
A metaphor-driven format that uses a sailing metaphor:
- Wind (sails): What propelled us forward? What gave us momentum?
- Anchor: What held us back? What slowed us down?
- Rocks: What risks or dangers do we see ahead?
- Island: What is our goal? Where are we heading?
Worked Example
Wind:
- Automated database migration script saved 2 days
- New team member ramped up faster than expected (good onboarding docs)
Anchor:
- Flaky CI pipeline — 3 false-negative test failures this sprint, causing re-runs and lost time
- Waiting for design approval added 2-day delay to 3 stories
Rocks (ahead):
- Black Friday traffic spike in 6 weeks — performance testing not yet started
- Key dependency (payment gateway) is migrating to v3 API; our integration uses v2
Island:
- Stable, performant checkout experience before Black Friday
Sailboat advantage. The "Rocks" section naturally transitions into risk identification for the next sprint. The "Island" keeps the team aligned on the goal. The metaphor makes the conversation less confrontational — "what's anchoring us" feels different from "what went wrong."
Format 5 — Starfish
An expansion of Start/Stop/Continue with finer granularity:
- Keep doing: High-value practices to maintain
- More of: Good things we should amplify
- Less of: Things that are not serving us well
- Stop doing: Things to eliminate entirely
- Start doing: New practices to introduce
When to use. When Start/Stop/Continue feels too coarse. The "More of" and "Less of" categories capture nuance — the practice is not bad enough to stop, but we should modulate it.
Format 6 — Mad/Sad/Glad
Emotional check-in format:
- Mad: What frustrated you?
- Sad: What disappointed you?
- Glad: What made you happy?
When to use. When the team is emotionally charged — after a difficult sprint, a production incident, or an organizational change. Mad/Sad/Glad gives explicit permission to express emotions, which is a prerequisite for productive discussion.
Facilitation note. Start with Glad. Ending a retrospective on frustration leaves a negative impression. Starting with Glad builds energy for the harder conversations.
Blameless Postmortems
When something goes wrong — an outage, a missed deadline, a security incident — the postmortem must be blameless. Not "no one is responsible" but "we examine systems and processes, not individual guilt." (Cross-reference: rca department, blameless-postmortem skill for full methodology.)
The Blameless Postmortem Structure
- Timeline. What happened, when, in what order? Facts only — no interpretation yet.
- Impact. Who was affected? For how long? What was the business impact?
- Root cause analysis. Why did it happen? (5 Whys, fishbone, or fault tree.) Aim for systemic causes, not "Developer X made a mistake."
- Contributing factors. What made the situation worse? (Missing monitoring, delayed response, unclear runbooks.)
- What went well. What worked during the response? What prevented worse outcomes?
- Action items. Specific, owned, time-bound changes to prevent recurrence.
Why Blameless Matters
Sidney Dekker's "Just Culture" research demonstrates that blame suppresses reporting. If the developer who caused an outage is punished, the next developer will hide the problem. Blame optimizes for punishment; blameless optimizes for learning.
The system produced the behavior. If a developer deployed broken code to production, ask: Why was it possible to deploy broken code? Where were the automated checks? Where was the review? The developer is the last link in a chain of systemic failures.
GSD connection. GSD's session handoff documents (HANDOFF-SESSION-*.md) serve as lightweight blameless records. They capture what happened, what worked, what didn't, and what the next session should know. They never assign blame — they provide context for continuous improvement.
Lessons Learned Databases
A retrospective that produces action items only for the current team serves one team. A lessons learned database serves the entire organization.
Structure
| Field | Purpose |
|---|---|
| Date | When the lesson was learned |
| Project/Sprint | Context |
| Category | Technical, process, communication, tooling |
| Lesson | What we learned (one sentence) |
| Context | What happened that taught us this |
| Recommendation | What to do differently in similar situations |
| Status | Open / implemented / superseded |
Worked Example — Lessons Learned Entries
| Date | Category | Lesson | Recommendation |
|---|---|---|---|
| 2026-03-15 | Process | Friday deploys correlate with weekend incidents | No deploys after 2 PM on Thursdays. Deploy windows: Mon-Thu AM. |
| 2026-03-22 | Technical | Redis cluster mode requires explicit connection pool configuration | Add Redis cluster configuration to onboarding checklist and infrastructure template |
| 2026-04-01 | Communication | Stakeholder not informed of scope change led to trust breakdown | All scope changes must be communicated to affected stakeholders within 24 hours via the RACI matrix |
| 2026-04-08 | Tooling | CI false negatives waste 2+ hours per sprint | Invest in CI reliability as infrastructure work, not "nice to have" |
Making Lessons Findable
A lessons learned database that nobody searches is a write-only data structure. Make lessons findable by:
- Tagging by category, technology, and project type
- Surfacing relevant lessons during planning phases ("Before you start a migration project, here are 12 lessons from past migrations")
- Reviewing lessons at project kickoff and milestone boundaries
- Retiring lessons that are obsolete or have been automated into the process
Organizational Learning Theory
Senge's Fifth Discipline
Peter Senge identified five disciplines for a "learning organization":
- Personal mastery. Individuals committed to their own growth.
- Mental models. Surfacing and challenging assumptions about how the world works.
- Shared vision. A compelling picture of the future that the whole team owns.
- Team learning. The team learns faster than any individual (dialogue over discussion).
- Systems thinking. Seeing the whole system, not just parts. Understanding feedback loops, delays, and unintended consequences.
The fifth discipline (systems thinking) integrates the other four. Without it, personal mastery is individual growth without organizational benefit. Without it, shared vision is aspiration without understanding.
Application to retrospectives. Retrospectives that surface symptoms ("deploys are slow") without understanding systems ("our CI pipeline is a bottleneck because it runs sequentially due to a shared test database") produce shallow improvements. Systems thinking in retrospectives asks: "What is the underlying structure that produces this behavior?"
Argyris's Double-Loop Learning
Single-loop learning: Detect an error, correct the action. Thermostat model: too cold -> turn on heat.
Double-loop learning: Detect an error, question the assumptions and goals that led to the action. Why is the thermostat set to this temperature? Should we heat this room at all?
| Single-loop | Double-loop | |
|---|---|---|
| Question | "Are we doing things right?" | "Are we doing the right things?" |
| Scope | Tactics, execution | Strategy, goals, assumptions |
| Retrospective example | "Our tests are too slow; let's parallelize them." | "Why are we writing so many integration tests instead of unit tests? Is our testing strategy wrong?" |
| Organizational example | "Sprint velocity dropped; let's add a developer." | "Why does our process require so many handoffs? Can we restructure teams to eliminate them?" |
Most retrospectives operate in single-loop mode. The facilitator's job is to push toward double-loop when patterns recur. If "deploys are risky" appears in 3 consecutive retrospectives, single-loop fixes (better rollback, more testing) have not worked. Double-loop asks: "Why is our deployment architecture fragile? What would a zero-risk deployment look like? What would we need to change about our system to make that possible?"
After-Action Reviews (AAR) — Military Heritage
The U.S. Army developed the After-Action Review as a structured reflection tool following every significant activity — not just failures, but successes.
AAR Structure
- What was supposed to happen? The plan, the objectives, the expected outcomes.
- What actually happened? Facts, timeline, deviations from plan.
- Why was there a difference? Root causes of deviations — both positive and negative.
- What can we do better next time? Specific, actionable improvements.
AAR Principles
- Conducted immediately. While memory is fresh. Not next week — today.
- Everyone participates. Rank does not matter during the AAR. A private's observation is as valid as a colonel's.
- Focus on events, not personalities. "The communication broke down" not "Lieutenant Smith failed."
- Honest and candid. The AAR is a professional conversation, not a performance evaluation.
AAR vs. Retrospective
AARs and Agile retrospectives share DNA. The key difference: AARs are tied to a specific event or mission; retrospectives cover a time period (sprint). For software teams, use AARs for specific incidents (production outages, failed releases) and retrospectives for regular cadence reflection.
GSD's Retrospective Pattern
GSD implements a distinctive retrospective pattern: session-to-session carry-forward.
Session Handoff as Retrospective
Each GSD session ends with a handoff document (HANDOFF-SESSION-*.md) that captures:
- What was accomplished. Deliverables, commits, artifacts.
- What was learned. Surprises, discoveries, corrections.
- What remains. Unfinished work, open questions, risks for the next session.
- What went well. Practices to continue.
- What to improve. Process adjustments for the next session.
This is not just documentation — it is a structured retrospective artifact. The next session begins by reading the handoff, inheriting both the work state and the accumulated wisdom.
Carry-Forward Accumulation
Over many sessions, handoff documents form a knowledge chain. Session 17's handoff references lessons from Session 16, which references Session 15. Patterns that recur across sessions become visible: "We keep underestimating integration work" or "Autonomous execution works best for well-defined tasks but struggles with ambiguous scope."
This carry-forward pattern is GSD's implementation of a lessons learned database — distributed across handoff documents rather than centralized in a single database, but serving the same purpose: making past experience available to future work.
STATE.md as Living Retrospective
STATE.md captures the current project state, including retrospective insights. It is updated at session boundaries and consulted at session start. This is Senge's "mental models" discipline made explicit — the team's shared understanding of where the project is, why, and what to do next.
Metrics-Driven Improvement
Retrospectives should be informed by data, not just feelings. Key metrics to bring to a retrospective:
| Metric | What it reveals | Source |
|---|---|---|
| Velocity trend | Is the team speeding up, slowing down, or stable? | Sprint tracking tool |
| Cycle time distribution | How predictable is our delivery? | Kanban board analytics |
| Defect escape rate | How many bugs reach production? | Bug tracker + monitoring |
| PR review turnaround | Is code review a bottleneck? | Git analytics |
| Test suite duration | Is the test suite slowing down CI? | CI/CD pipeline data |
| Deployment frequency | Are we deploying more or less often? | Deployment logs |
| Sprint goal achievement | Do we meet our sprint commitments? | Sprint review records |
The data is not the conclusion. Data starts the conversation: "Cycle time increased 40% this sprint. What happened?" The team provides context that the metrics cannot: "We were onboarding a new team member, which slowed reviews." That context determines whether the metric represents a problem or a temporary investment.
Celebration of Success
Retrospectives that focus exclusively on problems teach the team that reflection = criticism. Over time, people disengage.
Celebrate deliberately. Dedicate the first 10 minutes of every retrospective to wins:
- Features delivered that users appreciated
- Technical achievements (performance improvement, tech debt paid)
- Process improvements that worked
- Individual contributions worth recognizing
- Team collaboration moments
Why celebration matters. It reinforces positive behaviors. It builds psychological safety ("this team notices when things go well, not just when things go wrong"). It provides energy for the harder conversations that follow.
Psychological Safety for Honest Retrospectives
Amy Edmondson's research (cross-reference: stakeholder-communication skill) is directly applicable to retrospectives. A retrospective in an unsafe environment produces:
- Silence (people withhold observations)
- Blame deflection ("it wasn't my fault")
- Sugar-coating ("it was mostly fine")
- Performative agreement (everyone agrees to action items nobody intends to follow)
Building Safety in Retrospectives
- The facilitator goes first. Share your own mistake or struggle before asking others to share theirs.
- Anonymous input. Use digital tools that allow anonymous sticky notes. People write more honestly when their name is not attached.
- No managers in the room (sometimes). If the team is not speaking freely, try a retrospective without the manager present. Share the anonymized output afterward.
- Follow through on actions. Nothing destroys trust faster than repeatedly agreeing to changes and never implementing them. Track action items. Report on completion at the start of the next retrospective.
- Normalize failure. "What did we learn from this failure?" rather than "Why did this fail?" The first invites reflection; the second invites defensiveness.
Pattern Recognition Across Projects
The highest-value retrospective insight is the cross-project pattern: "This is the third project where we underestimated the migration effort" or "Every project with more than 2 external dependencies has had integration surprises."
How to Surface Cross-Project Patterns
- Tag lessons by theme. Use consistent categories across projects.
- Periodic meta-retrospectives. Quarterly, review lessons across all teams. What patterns emerge?
- Pattern library. Document recurring patterns: "When we see X, we tend to experience Y. Recommended response: Z."
- Onboarding. New team members read the pattern library during onboarding. They start with the organization's accumulated wisdom, not from zero.
Worked Example — Cross-Project Pattern
Pattern observed: Three consecutive projects underestimated the effort to migrate data from a legacy system.
Data:
- Project A: Estimated 2 weeks, actual 5 weeks (2.5x overrun)
- Project B: Estimated 3 weeks, actual 6 weeks (2x overrun)
- Project C: Estimated 4 weeks, actual 9 weeks (2.25x overrun)
Root cause (double-loop): Teams estimate migration based on the number of tables and records. They do not estimate the effort to clean dirty data, handle schema incompatibilities, verify data integrity post-migration, and coordinate the switchover. The estimation model is incomplete.
Pattern recommendation: For any data migration, multiply the initial estimate by 2.5x. Better: decompose migration into 5 phases (analysis, cleaning, transformation, migration, verification) and estimate each separately. This is reference class forecasting (cross-reference: estimation-planning skill) applied at the organizational level.
When to Use / When NOT to Use
When to use retrospectives
- After every sprint (mandatory in Scrum, recommended everywhere)
- After significant events (production incidents, successful launches, organizational changes)
- At project milestones and project completion
- When the team feels stuck, frustrated, or disconnected
When NOT to use retrospectives
- When the purpose is to assign blame (that is an interrogation, not a retrospective)
- When management has already decided the outcome ("we're doing a retrospective to explain why we're changing the process")
- When action items from the last 3 retrospectives have not been addressed (fix the follow-through problem first)
- When the team is in crisis and needs to act, not reflect (use a quick AAR after the crisis resolves)
The retrospective anti-pattern
The most common retrospective failure: "We had a retrospective. We wrote down action items. Nobody did them. We had another retrospective. We wrote down the same action items." This cycle teaches the team that retrospectives are performative, not productive. The fix: limit action items to 1-3, assign owners, set deadlines, and review completion at the start of the next retrospective. Fewer actions, actually done, beats many actions, never done.
Cross-References
- sinek agent: Leadership and pedagogy. Creating environments where learning is safe and celebrated. "Start With Why" applied to retrospective purpose.
- deming agent: Continuous improvement philosophy. PDCA as the engine under retrospective action items. Quality through process reflection.
- brooks agent: Department chair. Organizational learning, communication structures, why teams repeat mistakes.
- lei agent: Lean retrospectives — value stream mapping as a retrospective technique, eliminating waste in processes.
- stakeholder-communication skill: Psychological safety (Edmondson), conflict resolution, meeting facilitation techniques applicable to retrospectives.
- quality-assurance skill: Kaizen, PDCA cycle, metrics that inform retrospective discussions.
- risk-management skill: Retrospectives as risk identification opportunities; pre-mortems as forward-looking retrospectives.
- rca department (blameless-postmortem skill): Full blameless postmortem methodology, root cause analysis techniques (5 Whys, fishbone), and blameless culture principles.
References
- Derby, E. & Larsen, D. (2006). Agile Retrospectives: Making Good Teams Great. Pragmatic Bookshelf.
- Senge, P. M. (1990/2006). The Fifth Discipline. Revised edition. Doubleday.
- Argyris, C. (1977). "Double Loop Learning in Organizations." Harvard Business Review, 55(5), 115-125.
- Edmondson, A. C. (2018). The Fearless Organization. Wiley.
- Dekker, S. (2016). Just Culture: Restoring Trust and Accountability in Your Organization. 3rd edition. CRC Press.
- U.S. Army. (1993). A Leader's Guide to After-Action Reviews. Training Circular 25-20.
- Kerth, N. L. (2001). Project Retrospectives: A Handbook for Team Reviews. Dorset House.
- Sinek, S. (2014). Leaders Eat Last. Portfolio/Penguin.
- Rother, M. (2009). Toyota Kata. McGraw-Hill.