Awesome-omni-skills on-call-handoff-patterns

On-Call Handoff Patterns workflow skill. Use this skill when the user needs Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts and the operator should preserve the upstream workflow, copied support files, and provenance before merging or handing off.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/on-call-handoff-patterns" ~/.claude/skills/diegosouzapw-awesome-omni-skills-on-call-handoff-patterns && rm -rf "$T"
manifest: skills/on-call-handoff-patterns/SKILL.md
source content

On-Call Handoff Patterns

Overview

This public intake copy packages

plugins/antigravity-awesome-skills-claude/skills/on-call-handoff-patterns
from
https://github.com/sickn33/antigravity-awesome-skills
into the native Omni Skills editorial shape without hiding its origin.

Use it when the operator needs the upstream workflow, support files, and repository context to stay intact while the public validator and private enhancer continue their normal downstream flow.

This intake keeps the copied upstream files intact and uses

metadata.json
plus
ORIGIN.md
as the provenance anchor for review.

On-Call Handoff Patterns Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.

Imported source sections that did not map cleanly to the public headings are still preserved below or in the support files. Notable imported sections: Core Concepts, Templates, 🔴 Active Incidents, 🟡 Ongoing Investigations, 🟢 Resolved This Shift, 📋 Recent Changes.

When to Use This Skill

Use this section as the trigger filter. It should make the activation boundary explicit before the operator loads files, runs commands, or opens a pull request.

  • The task is unrelated to on-call handoff patterns
  • You need a different domain or tool outside this scope
  • Transitioning on-call responsibilities
  • Writing shift handoff summaries
  • Documenting ongoing investigations
  • Establishing on-call rotation procedures

Operating Table

SituationStart hereWhy it matters
First-time use
metadata.json
Confirms repository, branch, commit, and imported path before touching the copied workflow
Provenance review
ORIGIN.md
Gives reviewers a plain-language audit trail for the imported source
Workflow execution
SKILL.md
Starts with the smallest copied file that materially changes execution
Supporting context
SKILL.md
Adds the next most relevant copied source file without loading the entire package
Handoff decision
## Related Skills
Helps the operator switch to a stronger native skill when the task drifts

Workflow

This workflow is intentionally editorial and operational at the same time. It keeps the imported source useful to the operator while still satisfying the public intake standards that feed the downstream enhancer flow.

  1. Clarify goals, constraints, and required inputs.
  2. Apply relevant best practices and validate outcomes.
  3. Provide actionable steps and verification.
  4. If detailed examples are required, open resources/implementation-playbook.md.
  5. Confirm the user goal, the scope of the imported workflow, and whether this skill is still the right router for the task.
  6. Read the overview and provenance files before loading any copied upstream support files.
  7. Load only the references, examples, prompts, or scripts that materially change the outcome for the current request.

Imported Workflow Notes

Imported: Instructions

  • Clarify goals, constraints, and required inputs.
  • Apply relevant best practices and validate outcomes.
  • Provide actionable steps and verification.
  • If detailed examples are required, open
    resources/implementation-playbook.md
    .

Imported: Core Concepts

1. Handoff Components

ComponentPurpose
Active IncidentsWhat's currently broken
Ongoing InvestigationsIssues being debugged
Recent ChangesDeployments, configs
Known IssuesWorkarounds in place
Upcoming EventsMaintenance, releases

2. Handoff Timing

Recommended: 30 min overlap between shifts

Outgoing:
├── 15 min: Write handoff document
└── 15 min: Sync call with incoming

Incoming:
├── 15 min: Review handoff document
├── 15 min: Sync call with outgoing
└── 5 min: Verify alerting setup

Examples

Example 1: Ask for the upstream workflow directly

Use @on-call-handoff-patterns to handle <task>. Start from the copied upstream workflow, load only the files that change the outcome, and keep provenance visible in the answer.

Explanation: This is the safest starting point when the operator needs the imported workflow, but not the entire repository.

Example 2: Ask for a provenance-grounded review

Review @on-call-handoff-patterns against metadata.json and ORIGIN.md, then explain which copied upstream files you would load first and why.

Explanation: Use this before review or troubleshooting when you need a precise, auditable explanation of origin and file selection.

Example 3: Narrow the copied support files before execution

Use @on-call-handoff-patterns for <task>. Load only the copied references, examples, or scripts that change the outcome, and name the files explicitly before proceeding.

Explanation: This keeps the skill aligned with progressive disclosure instead of loading the whole copied package by default.

Example 4: Build a reviewer packet

Review @on-call-handoff-patterns using the copied upstream files plus provenance, then summarize any gaps before merge.

Explanation: This is useful when the PR is waiting for human review and you want a repeatable audit packet.

Best Practices

Treat the generated public skill as a reviewable packaging layer around the upstream repository. The goal is to keep provenance explicit and load only the copied source material that materially improves execution.

  • Document everything - Future you will thank you
  • Escalate early - Better safe than sorry
  • Take breaks - Alert fatigue is real
  • Keep handoffs synchronous - Async loses context
  • Test your setup - Before incidents, not during
  • Don't skip handoffs - Context loss causes incidents
  • Don't hero - Escalate when needed

Imported Operating Notes

Imported: Escalation Guidelines

When to Escalate


#### Imported: Best Practices

### Do's
- **Document everything** - Future you will thank you
- **Escalate early** - Better safe than sorry
- **Take breaks** - Alert fatigue is real
- **Keep handoffs synchronous** - Async loses context
- **Test your setup** - Before incidents, not during

### Don'ts
- **Don't skip handoffs** - Context loss causes incidents
- **Don't hero** - Escalate when needed
- **Don't ignore alerts** - Even if they seem minor
- **Don't work sick** - Swap shifts instead
- **Don't disappear** - Stay reachable during shift

## Troubleshooting

### Problem: The operator skipped the imported context and answered too generically

**Symptoms:** The result ignores the upstream workflow in `plugins/antigravity-awesome-skills-claude/skills/on-call-handoff-patterns`, fails to mention provenance, or does not use any copied source files at all.
**Solution:** Re-open `metadata.json`, `ORIGIN.md`, and the most relevant copied upstream files. Load only the files that materially change the answer, then restate the provenance before continuing.

### Problem: The imported workflow feels incomplete during review

**Symptoms:** Reviewers can see the generated `SKILL.md`, but they cannot quickly tell which references, examples, or scripts matter for the current task.
**Solution:** Point at the exact copied references, examples, scripts, or assets that justify the path you took. If the gap is still real, record it in the PR instead of hiding it.

### Problem: The task drifted into a different specialization

**Symptoms:** The imported skill starts in the right place, but the work turns into debugging, architecture, design, security, or release orchestration that a native skill handles better.
**Solution:** Use the related skills section to hand off deliberately. Keep the imported provenance visible so the next skill inherits the right context instead of starting blind.



## Related Skills

- `@00-andruia-consultant-v2` - Use when the work is better handled by that native specialization after this imported skill establishes context.
- `@10-andruia-skill-smith-v2` - Use when the work is better handled by that native specialization after this imported skill establishes context.
- `@20-andruia-niche-intelligence-v2` - Use when the work is better handled by that native specialization after this imported skill establishes context.
- `@2d-games` - Use when the work is better handled by that native specialization after this imported skill establishes context.

## Additional Resources

Use this support matrix and the linked files below as the operator packet for this imported skill. They should reflect real copied source material, not generic scaffolding.

| Resource family | What it gives the reviewer | Example path |
| --- | --- | --- |
| `references` | copied reference notes, guides, or background material from upstream | `references/n/a` |
| `examples` | worked examples or reusable prompts copied from upstream | `examples/n/a` |
| `scripts` | upstream helper scripts that change execution or validation | `scripts/n/a` |
| `agents` | routing or delegation notes that are genuinely part of the imported package | `agents/n/a` |
| `assets` | supporting assets or schemas copied from the source package | `assets/n/a` |



### Imported Reference Notes

#### Imported: 🔧 Quick Reference

### Common Commands
```bash
# Check service health
kubectl get pods -A | grep -v Running

# Recent deployments
kubectl get events --sort-by='.lastTimestamp' | tail -20

# Database connections
psql -c "SELECT count(*) FROM pg_stat_activity;"

# Clear cache (emergency only)
redis-cli FLUSHDB

Important Links


Imported: Resources


Incoming on-call (@bob) - Please confirm you have:

  • Joined #inc-20240122-payment
  • Access to dashboards
  • Understand current state
  • Know escalation path

#### Imported: Resources

- [Google SRE - Being On-Call](https://sre.google/sre-book/being-on-call/)
- [PagerDuty On-Call Guide](https://www.pagerduty.com/resources/learn/on-call-management/)
- [Increment On-Call Issue](https://increment.com/on-call/)

#### Imported: Templates

### Template 1: Shift Handoff Document

```markdown
# On-Call Handoff: Platform Team

**Outgoing**: @alice (2024-01-15 to 2024-01-22)
**Incoming**: @bob (2024-01-22 to 2024-01-29)
**Handoff Time**: 2024-01-22 09:00 UTC

---

#### Imported: 🔴 Active Incidents

### None currently active
No active incidents at handoff time.

---

#### Imported: 🟡 Ongoing Investigations

### 1. Intermittent API Timeouts (ENG-1234)
**Status**: Investigating
**Started**: 2024-01-20
**Impact**: ~0.1% of requests timing out

**Context**:
- Timeouts correlate with database backup window (02:00-03:00 UTC)
- Suspect backup process causing lock contention
- Added extra logging in PR #567 (deployed 01/21)

**Next Steps**:
- [ ] Review new logs after tonight's backup
- [ ] Consider moving backup window if confirmed

**Resources**:
- Dashboard: [API Latency](https://grafana/d/api-latency)
- Thread: #platform-eng (01/20, 14:32)

---

### 2. Memory Growth in Auth Service (ENG-1235)
**Status**: Monitoring
**Started**: 2024-01-18
**Impact**: None yet (proactive)

**Context**:
- Memory usage growing ~5% per day
- No memory leak found in profiling
- Suspect connection pool not releasing properly

**Next Steps**:
- [ ] Review heap dump from 01/21
- [ ] Consider restart if usage > 80%

**Resources**:
- Dashboard: [Auth Service Memory](https://grafana/d/auth-memory)
- Analysis doc: [Memory Investigation](https://docs/eng-1235)

---

#### Imported: 🟢 Resolved This Shift

### Payment Service Outage (2024-01-19)
- **Duration**: 23 minutes
- **Root Cause**: Database connection exhaustion
- **Resolution**: Rolled back v2.3.4, increased pool size
- **Postmortem**: [POSTMORTEM-89](https://docs/postmortem-89)
- **Follow-up tickets**: ENG-1230, ENG-1231

---

#### Imported: 📋 Recent Changes

### Deployments
| Service | Version | Time | Notes |
|---------|---------|------|-------|
| api-gateway | v3.2.1 | 01/21 14:00 | Bug fix for header parsing |
| user-service | v2.8.0 | 01/20 10:00 | New profile features |
| auth-service | v4.1.2 | 01/19 16:00 | Security patch |

### Configuration Changes
- 01/21: Increased API rate limit from 1000 to 1500 RPS
- 01/20: Updated database connection pool max from 50 to 75

### Infrastructure
- 01/20: Added 2 nodes to Kubernetes cluster
- 01/19: Upgraded Redis from 6.2 to 7.0

---

#### Imported: ⚠️ Known Issues & Workarounds

### 1. Slow Dashboard Loading
**Issue**: Grafana dashboards slow on Monday mornings
**Workaround**: Wait 5 min after 08:00 UTC for cache warm-up
**Ticket**: OPS-456 (P3)

### 2. Flaky Integration Test
**Issue**: `test_payment_flow` fails intermittently in CI
**Workaround**: Re-run failed job (usually passes on retry)
**Ticket**: ENG-1200 (P2)

---

#### Imported: 📅 Upcoming Events

| Date | Event | Impact | Contact |
|------|-------|--------|---------|
| 01/23 02:00 | Database maintenance | 5 min read-only | @dba-team |
| 01/24 14:00 | Major release v5.0 | Monitor closely | @release-team |
| 01/25 | Marketing campaign | 2x traffic expected | @platform |

---

#### Imported: 📞 Escalation Reminders

| Issue Type | First Escalation | Second Escalation |
|------------|------------------|-------------------|
| Payment issues | @payments-oncall | @payments-manager |
| Auth issues | @auth-oncall | @security-team |
| Database issues | @dba-team | @infra-manager |
| Unknown/severe | @engineering-manager | @vp-engineering |

---

#### Imported: Handoff Checklist

### Outgoing Engineer
- [x] Document active incidents
- [x] Document ongoing investigations
- [x] List recent changes
- [x] Note known issues
- [x] Add upcoming events
- [x] Sync with incoming engineer

### Incoming Engineer
- [ ] Read this document
- [ ] Join sync call
- [ ] Verify PagerDuty is routing to you
- [ ] Verify Slack notifications working
- [ ] Check VPN/access working
- [ ] Review critical dashboards

Template 2: Quick Handoff (Async)

# Quick Handoff: @alice → @bob

#### Imported: TL;DR

- No active incidents
- 1 investigation ongoing (API timeouts, see ENG-1234)
- Major release tomorrow (01/24) - be ready for issues

#### Imported: Watch List

1. API latency around 02:00-03:00 UTC (backup window)
2. Auth service memory (restart if > 80%)

#### Imported: Recent

- Deployed api-gateway v3.2.1 yesterday (stable)
- Increased rate limits to 1500 RPS

#### Imported: Coming Up

- 01/23 02:00 - DB maintenance (5 min read-only)
- 01/24 14:00 - v5.0 release

#### Imported: Questions?

I'll be available on Slack until 17:00 today.

Template 3: Incident Handoff (Mid-Incident)

# INCIDENT HANDOFF: Payment Service Degradation

**Incident Start**: 2024-01-22 08:15 UTC
**Current Status**: Mitigating
**Severity**: SEV2

---

#### Imported: Current State

- Error rate: 15% (down from 40%)
- Mitigation in progress: scaling up pods
- ETA to resolution: ~30 min

#### Imported: What We Know

1. Root cause: Memory pressure on payment-service pods
2. Triggered by: Unusual traffic spike (3x normal)
3. Contributing: Inefficient query in checkout flow

#### Imported: What We've Done

- Scaled payment-service from 5 → 15 pods
- Enabled rate limiting on checkout endpoint
- Disabled non-critical features

#### Imported: What Needs to Happen

1. Monitor error rate - should reach <1% in ~15 min
2. If not improving, escalate to @payments-manager
3. Once stable, begin root cause investigation

#### Imported: Key People

- Incident Commander: @alice (handing off)
- Comms Lead: @charlie
- Technical Lead: @bob (incoming)

#### Imported: Communication

- Status page: Updated at 08:45
- Customer support: Notified
- Exec team: Aware

#### Imported: Handoff Sync Meeting

### Agenda (15 minutes)

```markdown

#### Imported: Handoff Sync: @alice → @bob

1. **Active Issues** (5 min)
   - Walk through any ongoing incidents
   - Discuss investigation status
   - Transfer context and theories

2. **Recent Changes** (3 min)
   - Deployments to watch
   - Config changes
   - Known regressions

3. **Upcoming Events** (3 min)
   - Maintenance windows
   - Expected traffic changes
   - Releases planned

4. **Questions** (4 min)
   - Clarify anything unclear
   - Confirm access and alerting
   - Exchange contact info

Imported: On-Call Best Practices

Before Your Shift


#### Imported: Pre-Shift Checklist

### Access Verification
- [ ] VPN working
- [ ] kubectl access to all clusters
- [ ] Database read access
- [ ] Log aggregator access (Splunk/Datadog)
- [ ] PagerDuty app installed and logged in

### Alerting Setup
- [ ] PagerDuty schedule shows you as primary
- [ ] Phone notifications enabled
- [ ] Slack notifications for incident channels
- [ ] Test alert received and acknowledged

### Knowledge Refresh
- [ ] Review recent incidents (past 2 weeks)
- [ ] Check service changelog
- [ ] Skim critical runbooks
- [ ] Know escalation contacts

### Environment Ready
- [ ] Laptop charged and accessible
- [ ] Phone charged
- [ ] Quiet space available for calls
- [ ] Secondary contact identified (if traveling)

During Your Shift


#### Imported: Daily On-Call Routine

### Morning (start of day)
- [ ] Check overnight alerts
- [ ] Review dashboards for anomalies
- [ ] Check for any P0/P1 tickets created
- [ ] Skim incident channels for context

### Throughout Day
- [ ] Respond to alerts within SLA
- [ ] Document investigation progress
- [ ] Update team on significant issues
- [ ] Triage incoming pages

### End of Day
- [ ] Hand off any active issues
- [ ] Update investigation docs
- [ ] Note anything for next shift

After Your Shift


#### Imported: Post-Shift Checklist

- [ ] Complete handoff document
- [ ] Sync with incoming on-call
- [ ] Verify PagerDuty routing changed
- [ ] Close/update investigation tickets
- [ ] File postmortems for any incidents
- [ ] Take time off if shift was stressful

Imported: Limitations

  • Use this skill only when the task clearly matches the scope described above.
  • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
  • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.