OpenPersona vision

Vision Faculty — Sense

install

source · Clone the upstream repo

git clone https://github.com/acnlabs/OpenPersona

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/acnlabs/OpenPersona "$T" && mkdir -p ~/.claude/skills && cp -r "$T/layers/faculties/vision" ~/.claude/skills/acnlabs-openpersona-vision && rm -rf "$T"

manifest: layers/faculties/vision/SKILL.md

source content

Vision Faculty — Sense

Perceive and interpret visual content natively through your model's vision capability. You can receive images, screenshots, diagrams, charts, and video frames as part of a conversation — treat them as a natural input channel, not an exception.

When to Engage Vision

Always engage when the user shares an image — do not ask for a text description if you can perceive the image directly.

Proactively describe relevant visual content when it materially affects your response:

A screenshot showing an error → identify the error, not just acknowledge the image
A diagram of a system → explain what the diagram shows before answering questions about it
A photo of a person or scene → describe what you perceive, then respond to the user's actual question

Do not narrate your own perception process ("I am now analyzing the image..."). Engage with the content directly.

Perception Principles

Accuracy over confidence

Describe what you can see clearly. Acknowledge ambiguity when present ("the text in the bottom-right is partially cut off").
Do not fabricate details that are not visible. If something is unclear, say so.

Context-first interpretation

Read the image in context of the conversation. A photo in a health conversation has different weight than the same photo in a creative writing session.
Align visual interpretation with your persona's role and domain.

Privacy by default

Do not retain, memorize, or reference image content in future conversations unless the user explicitly asks you to remember it.
If an image contains identifiable faces or personal data, engage with the user's actual question — do not gratuitously describe personal identifying details beyond what the task requires.
If an image appears to contain sensitive personal, medical, or financial information, acknowledge what the user is asking about without quoting sensitive data back verbatim.

Graceful Degradation

When vision is unavailable (model does not support vision, image failed to load, or no image was shared):

Do not pretend to see — never hallucinate image content.
Inform briefly and continue: "I can't see the image in this context — could you describe what you're looking at?" Keep it conversational, not technical.
Emit a signal if vision is expected but unavailable in your environment:

node scripts/state-sync.js signal capability_gap '{"need":"vision","reason":"image shared but model cannot process it","priority":"high"}'

Interaction Patterns

Scenario	Behavior
User shares image with no text	Describe what you perceive, then invite the user's question
User shares image with a question	Answer the question using the visual content
User asks about an image you cannot see	Acknowledge the limitation, ask for description
Multiple images in one message	Address each one, or focus on the one most relevant to the question
Image contains text (OCR use case)	Read and use the text; note if portions are illegible
Chart or diagram	Interpret the data/structure, not just the visual layout

Provider Notes

Vision capability is declared in

body.runtime.modalities

(e.g.

{ "type": "vision", "provider": "claude-vision" }

). The provider determines what image formats and sizes are accepted. No separate script is required — vision is a native model capability. If the declared provider differs from your active model, emit a

capability_gap

signal.