Pm-claude-skills ai-product-canvas

Structure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.

install
source · Clone the upstream repo
git clone https://github.com/mohitagw15856/pm-claude-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mohitagw15856/pm-claude-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/pm-advanced/skills/ai-product-canvas" ~/.claude/skills/mohitagw15856-pm-claude-skills-ai-product-canvas && rm -rf "$T"
manifest: plugins/pm-advanced/skills/ai-product-canvas/SKILL.md
source content

AI Product Canvas Skill

Define AI products with the same rigour as any product decision — but with additional layers for data, model, evaluation, and responsible AI. This canvas prevents the most common AI product failure: building a technically impressive feature that doesn't solve a real problem.

AI Product Anti-Patterns to Check First

Before building, flag if any of these apply:

  • ❌ "We should add AI to [existing feature]" — with no user problem defined
  • ❌ Accuracy target undefined before build begins
  • ❌ No plan for what happens when the model is wrong
  • ❌ User-facing AI output with no human review or fallback
  • ❌ Training data not audited for bias or quality
  • ❌ No evaluation metric — "we'll know it when we see it"

AI Product Canvas Output Format

AI Product Canvas — [Feature Name] — [Date]

PM Owner: [Name] ML/AI Lead: [Name] Status: Discovery / Design / Build / Evaluation / Live


1. Problem Definition

User problem being solved:

[What specific situation is the user in? What job are they trying to get done?]

Why AI?

[What makes this problem require AI vs a deterministic solution? If the answer is "because we can," stop here.]

Success for the user looks like:

[What outcome does the user experience when the AI feature is working well?]


2. AI Approach

Task type:

  • Classification
  • Generation (text, image, code)
  • Summarisation / extraction
  • Recommendation
  • Search / retrieval
  • Prediction / forecasting
  • Conversation / agent

Model approach:

  • LLM API (GPT-4, Claude, Gemini, etc.) — specify: [Model name + version]
  • Fine-tuned model on own data
  • Custom model trained from scratch
  • RAG (retrieval-augmented generation)
  • Embedding + vector search

Rationale for chosen approach: [Why this, not alternatives]


3. Data Requirements

Data TypeSourceVolumeQuality StatusBias Risk
[Training data][Where it comes from][Volume][Audit status]H/M/L
[Evaluation data][Where it comes from][Volume][Audit status]H/M/L

Data gaps: [What's missing and plan to get it] Privacy considerations: [Any PII in training or inference data] Data ownership: [Do we own this data? Can we use it for training?]


4. Evaluation Framework

Primary metric: [The number that defines success — accuracy, F1, BLEU, user rating, task completion rate] Minimum acceptable threshold: [Below X, the feature does not ship] Human evaluation plan: [How will humans review model outputs? Sampling rate? Review panel?]

Evaluation TypeMethodCadenceOwner
Offline (pre-launch)[Test set, benchmark]Pre-launchML Lead
Online (post-launch)[A/B test, user feedback]WeeklyPM + ML
Adversarial[Red-team, edge cases]Pre-launchSafety reviewer

5. User Experience Design

How is AI output presented?

  • Direct output shown to user (high trust required)
  • AI-assisted with user confirmation
  • Suggestion user can accept/reject
  • Background action with audit log

Confidence and uncertainty handling:

  • What happens when confidence is low? [Show alternative, ask for clarification, fallback to manual]
  • How is uncertainty communicated to the user? [UI pattern]

Fallback plan:

  • If the model fails or returns an error: [Specific fallback behaviour]
  • If accuracy degrades below threshold: [Kill switch or graceful degradation plan]

6. Responsible AI Checklist

  • Bias audit completed on training data
  • Demographic fairness evaluated (does performance differ by user group?)
  • Hallucination / confabulation risk assessed and mitigated
  • User can see and correct AI output
  • Opt-out mechanism exists (can user disable the AI feature?)
  • Output provenance visible when relevant (does user know AI generated this?)
  • PII not used in ways user didn't consent to
  • Regulatory review completed (GDPR, AI Act, sector-specific)
  • Model cards / documentation completed

7. Launch & Monitoring Plan

Rollout: [% of users, with staged expansion criteria] Monitoring metrics:

  • Model performance: [Metric + alert threshold]
  • User engagement with AI output: [Acceptance rate, override rate, feedback score]
  • Error rate: [% of failed inferences]
  • Latency: [P95 target]

Model refresh cadence: [How often is the model retrained or updated?] Drift detection: [How will you know when model performance degrades in production?]


Guidelines

  • Never skip the "Why AI?" section — it's the most important question in AI product development
  • The fallback UX is not optional — what happens when AI fails defines your product's trustworthiness
  • Responsible AI checklist must be completed before launch, not after
  • Include latency in success metrics — a 5-second AI response is often worse than no AI at all
  • Recommend starting with a human-in-the-loop design and automating only when accuracy is proven

Required Inputs

Ask the user for these if not provided:

  • Feature or product description (what the AI is intended to do)
  • User problem (what problem the AI is solving for users)
  • Available data (what training/inference data exists)
  • ML/AI lead (who owns the technical implementation)

Quality Checks

  • "Why AI?" is answered clearly (not "because we can")
  • Minimum acceptable accuracy threshold is defined before build begins
  • Fallback UX is specified for model failures or low-confidence outputs
  • Responsible AI checklist is completed (not deferred to post-launch)
  • Monitoring plan includes both model performance and user engagement metrics