Squire infrastructure-documenter
git clone https://github.com/eddiebelaval/squire
T=$(mktemp -d) && git clone --depth=1 https://github.com/eddiebelaval/squire "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/infrastructure-documenter" ~/.claude/skills/eddiebelaval-squire-infrastructure-documenter && rm -rf "$T"
skills/infrastructure-documenter/SKILL.mdname: infrastructure-documenter description: Expert guide for documenting infrastructure including architecture diagrams, runbooks, system documentation, and operational procedures. Use when creating technical documentation for systems and deployments. slug: infrastructure-documenter category: operations complexity: complex version: "1.0.0" author: "id8Labs" triggers:
- "infrastructure-documenter"
- "infrastructure documenter" tags:
- development
- tool-factory-retrofitted---
Infrastructure Documenter Skill
Core Workflows
Workflow 1: Primary Action
- Analyze the input and context
- Validate prerequisites are met
- Execute the core operation
- Verify the output meets expectations
- Report results
Overview
This skill helps you create clear, maintainable infrastructure documentation. Covers architecture diagrams, runbooks, system documentation, operational procedures, and documentation-as-code practices.
Documentation Philosophy
Principles
- Living documentation: Keep it in sync with reality
- Audience-aware: Different docs for different readers
- Actionable: Every doc should help someone do something
- Version-controlled: Documentation changes tracked with code
Document Types
| Type | Audience | Purpose |
|---|---|---|
| Architecture | Engineers | Understand system design |
| Runbooks | Ops/SRE | Handle incidents |
| API Docs | Developers | Integrate with system |
| Onboarding | New hires | Get up to speed |
| Decision Records | Future you | Understand why |
Architecture Documentation
System Architecture Overview
# System Architecture ## Overview [Project Name] is a [type] application that [purpose]. ## High-Level Architecture
┌─────────────────────────────────────────────────────────────┐ │ Users │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Vercel Edge │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Next.js App │ │ Edge Functions │ │ │ └─────────────────┘ └─────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Supabase │ │ Redis │ │ Stripe │ │ - PostgreSQL │ │ - Session │ │ - Payments │ │ - Auth │ │ - Cache │ │ - Webhooks │ │ - Realtime │ │ │ │ │ │ - Storage │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘
## Components ### Frontend (Next.js App) - **Location**: Vercel Edge Network - **Framework**: Next.js 14 (App Router) - **Styling**: Tailwind CSS + shadcn/ui - **State**: Zustand + React Query ### Backend Services | Service | Provider | Purpose | |---------|----------|---------| | Database | Supabase | PostgreSQL with RLS | | Auth | Supabase Auth | User authentication | | Storage | Supabase Storage | File uploads | | Cache | Upstash Redis | Session & API cache | | Payments | Stripe | Subscriptions | | Email | Resend | Transactional emails | ### Data Flow 1. User request → Vercel Edge 2. SSR/API Route processes request 3. Database queries via Supabase client 4. Response cached at edge (when applicable) 5. Response returned to user ## Security ### Authentication Flow 1. User signs in via Supabase Auth 2. JWT token issued and stored in cookie 3. Server validates token on each request 4. RLS policies enforce data access ### Data Protection - All data encrypted at rest (AES-256) - TLS 1.3 for data in transit - Secrets stored in Vercel environment - PII fields encrypted in database
Mermaid Diagrams
## Request Flow ```mermaid sequenceDiagram participant U as User participant V as Vercel participant N as Next.js participant S as Supabase participant R as Redis U->>V: HTTPS Request V->>N: Route to App alt Cached Response N->>R: Check Cache R-->>N: Cache Hit N-->>U: Return Cached else Cache Miss N->>S: Query Database S-->>N: Data N->>R: Store in Cache N-->>U: Return Response end
Database Schema
erDiagram users ||--o{ projects : owns users { uuid id PK text email text name timestamp created_at } projects ||--o{ tasks : contains projects { uuid id PK uuid user_id FK text name text status } tasks { uuid id PK uuid project_id FK text title boolean completed }
## Runbooks ### Runbook Template ```markdown # Runbook: [Service Name] - [Issue Type] ## Overview Brief description of the issue and when this runbook applies. ## Severity - **P1 (Critical)**: Complete outage - **P2 (High)**: Degraded service - **P3 (Medium)**: Minor impact - **P4 (Low)**: No user impact ## Detection How this issue is typically detected: - [ ] Alert from [monitoring system] - [ ] User report - [ ] Automated check failure ## Impact Assessment - **Users affected**: All / Segment / None - **Data at risk**: Yes / No - **Revenue impact**: High / Medium / Low / None ## Prerequisites - [ ] Access to [system/dashboard] - [ ] Credentials for [service] - [ ] Contact info for [team/person] ## Resolution Steps ### Step 1: Verify the Issue ```bash # Check service status curl -I https://api.example.com/health # Check logs vercel logs --follow
Step 2: Identify Root Cause
Common causes:
- Database connection pool exhausted
- Memory limit reached
- External service down
- Bad deployment
Step 3: Apply Fix
If Database Issue:
# Check connection count SELECT count(*) FROM pg_stat_activity; # Kill idle connections SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '1 hour';
If Bad Deployment:
# Rollback to previous deployment vercel rollback
Step 4: Verify Fix
# Check service health curl https://api.example.com/health # Monitor error rates for 15 minutes
Escalation
If unable to resolve within 30 minutes:
- Page on-call engineer: [contact]
- Notify stakeholders in #incidents
- Update status page
Post-Incident
- Create incident report
- Schedule post-mortem (P1/P2 only)
- Update this runbook if needed
Related Links
### Database Runbooks ```markdown # Runbook: Database Performance Issues ## Symptoms - Slow API responses (>1s) - Timeout errors in logs - High database CPU in dashboard ## Quick Checks ### 1. Check Active Connections ```sql SELECT state, count(*), max(now() - query_start) as max_duration FROM pg_stat_activity GROUP BY state;
2. Find Long-Running Queries
SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '30 seconds' ORDER BY duration DESC;
3. Check Table Sizes
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC LIMIT 10;
4. Check Missing Indexes
SELECT relname, seq_scan, idx_scan, seq_scan - idx_scan AS difference FROM pg_stat_user_tables WHERE seq_scan > idx_scan ORDER BY difference DESC;
Resolution
Kill Problematic Queries
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid = [PID_FROM_ABOVE];
Add Missing Index
CREATE INDEX CONCURRENTLY idx_table_column ON table_name (column_name);
## Decision Records (ADRs) ### ADR Template ```markdown # ADR-001: Choose Supabase for Database ## Status Accepted ## Context We need a database solution for [Project Name] that supports: - PostgreSQL compatibility - Real-time subscriptions - Built-in authentication - Easy local development - Generous free tier ## Decision We will use Supabase as our primary database and auth provider. ## Alternatives Considered ### PlanetScale **Pros:** - Excellent scaling - Branching for schema changes - MySQL compatible **Cons:** - No built-in auth - No real-time subscriptions - Additional services needed ### Firebase **Pros:** - Real-time built-in - Mature platform - Good mobile SDKs **Cons:** - NoSQL (not ideal for our use case) - Vendor lock-in concerns - Complex security rules ## Consequences ### Positive - Single provider for DB + Auth + Storage - Great developer experience - Row Level Security for data protection - Local development with supabase CLI ### Negative - PostgreSQL-specific features tie us to provider - Supabase still maturing (some rough edges) - Limited to their managed offering ### Risks - Supabase scaling limitations at high traffic - Migration cost if we need to move ## References - [Supabase Documentation](https://supabase.com/docs) - [Comparison: Supabase vs Firebase](https://...)
API Documentation
Endpoint Documentation
# API Reference ## Base URL
Production: https://api.example.com/v1 Staging: https://staging-api.example.com/v1
## Authentication All API requests require authentication via Bearer token. ```bash curl -H "Authorization: Bearer YOUR_TOKEN" \ https://api.example.com/v1/users
Endpoints
Users
Get Current User
GET /users/me
Response:
{ "id": "usr_123", "email": "user@example.com", "name": "John Doe", "created_at": "2024-01-01T00:00:00Z" }
Update User
PATCH /users/me
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | No | Display name |
| avatar_url | string | No | Profile image URL |
Example:
curl -X PATCH \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{"name": "Jane Doe"}' \ https://api.example.com/v1/users/me
Error Responses
| Status | Code | Description |
|---|---|---|
| 400 | BAD_REQUEST | Invalid request body |
| 401 | UNAUTHORIZED | Missing or invalid token |
| 403 | FORBIDDEN | Insufficient permissions |
| 404 | NOT_FOUND | Resource not found |
| 429 | RATE_LIMITED | Too many requests |
| 500 | INTERNAL_ERROR | Server error |
Error Response Format:
{ "error": { "code": "NOT_FOUND", "message": "User not found" } }
## Environment Documentation ### Environment Matrix ```markdown # Environments ## Overview | Environment | URL | Purpose | Deploy | |-------------|-----|---------|--------| | Production | https://myapp.com | Live users | Manual (main) | | Staging | https://staging.myapp.com | Pre-release testing | Auto (main) | | Preview | https://pr-*.vercel.app | PR review | Auto (PR) | | Development | http://localhost:3000 | Local dev | Manual | ## Configuration ### Production ```env NODE_ENV=production DATABASE_URL=[Supabase Production] NEXT_PUBLIC_APP_URL=https://myapp.com
Staging
NODE_ENV=production DATABASE_URL=[Supabase Staging Branch] NEXT_PUBLIC_APP_URL=https://staging.myapp.com
Development
NODE_ENV=development DATABASE_URL=[Local Supabase] NEXT_PUBLIC_APP_URL=http://localhost:3000
Access
Production
- Vercel: Admin only
- Database: Read-only for devs, write for admin
- Logs: All engineers
Staging
- Vercel: All engineers
- Database: All engineers
- Logs: All engineers
Secrets Rotation
| Secret | Rotation | Last Rotated |
|---|---|---|
| Database password | 90 days | 2024-01-15 |
| API keys | 90 days | 2024-01-15 |
| JWT secret | Never | Initial setup |
## Documentation-as-Code ### Documentation Structure
docs/ ├── README.md # Documentation index ├── architecture/ │ ├── overview.md # System architecture │ ├── data-flow.md # Data flow diagrams │ └── decisions/ # ADRs │ ├── 001-database.md │ └── 002-hosting.md ├── runbooks/ │ ├── README.md # Runbook index │ ├── database.md # Database issues │ ├── deployment.md # Deployment issues │ └── outage.md # Service outage ├── api/ │ └── reference.md # API documentation └── onboarding/ ├── setup.md # Local setup └── contributing.md # How to contribute
### Auto-Generated Documentation ```yaml # .github/workflows/docs.yml name: Generate Docs on: push: branches: [main] paths: - 'src/**' - 'docs/**' jobs: generate-docs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Generate API docs from OpenAPI run: | npx @redocly/cli build-docs openapi.yaml \ --output docs/api/index.html - name: Generate TypeDoc run: npx typedoc --out docs/api/typescript - name: Deploy to GitHub Pages uses: peaceiris/actions-gh-pages@v3 with: github_token: ${{ secrets.GITHUB_TOKEN }} publish_dir: ./docs
Documentation Checklist
Architecture Docs
- System overview diagram
- Component descriptions
- Data flow documentation
- Security architecture
- Technology decisions (ADRs)
Operational Docs
- Runbooks for common issues
- Deployment procedures
- Monitoring and alerting
- Incident response plan
- On-call procedures
Developer Docs
- Local setup guide
- API reference
- Contributing guidelines
- Code conventions
- Testing guide
Maintenance
- Documentation review schedule
- Ownership assigned
- Change process defined
- Versioning strategy
When to Use This Skill
Invoke this skill when:
- Creating architecture documentation
- Writing runbooks for operations
- Documenting decision rationale (ADRs)
- Setting up documentation structure
- Creating onboarding materials
- Building automated documentation
- Planning incident response procedures