Screenpipe screenpipe-api
Query the user's screen recordings, audio, UI elements, and usage analytics via the local Screenpipe REST API at localhost:3030. Use when the user asks about their screen activity, meetings, apps, productivity, media export, retranscription, or connected services.
git clone https://github.com/screenpipe/screenpipe
T=$(mktemp -d) && git clone --depth=1 https://github.com/screenpipe/screenpipe "$T" && mkdir -p ~/.claude/skills && cp -r "$T/crates/screenpipe-core/assets/skills/screenpipe-api" ~/.claude/skills/screenpipe-screenpipe-screenpipe-api && rm -rf "$T"
crates/screenpipe-core/assets/skills/screenpipe-api/SKILL.mdScreenpipe API
Local REST API at
http://localhost:3030. Full reference (60+ endpoints): https://docs.screenpi.pe/llms-full.txt
Authentication
ALL requests require authentication. Add the auth header to every curl call:
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/..."
The
$SCREENPIPE_LOCAL_API_KEY env var is already set in your environment. Without it you get 403. The only exception is /health (no auth needed).
Context Window Protection
API responses can be large. Always write curl output to a file first (
curl ... -o /tmp/sp_result.json), check size (wc -c /tmp/sp_result.json), and if over 5KB read only the first 50-100 lines. Extract what you need with jq. NEVER dump full large responses into context.
1. Search — GET /search
GET /searchcurl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/search?q=QUERY&content_type=all&limit=10&start_time=1h%20ago"
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| string | No | Keywords. Do NOT use for audio searches — transcriptions are noisy, q filters too aggressively. |
| string | No | (default), , , , , . Screen text is primarily captured via the OS accessibility tree (); OCR is a fallback for apps without accessibility support (games, remote desktops). Use unless you need a specific modality. |
| integer | No | Max 1-20. Default: 10 |
| integer | No | Pagination. Default: 0 |
| ISO 8601 or relative | Yes | Accepts or , , |
| ISO 8601 or relative | No | Defaults to now. Accepts , |
| string | No | e.g. "Google Chrome", "Slack", "zoom.us" |
| string | No | Window title substring |
| string | No | Filter audio by speaker (case-insensitive partial) |
| boolean | No | Only focused windows |
| integer | No | Truncate each result's text (middle-truncation) |
Progressive Disclosure
Don't jump to heavy
/search calls. Escalate:
| Step | Endpoint | When |
|---|---|---|
| 0 | | Always query first/in parallel — highest signal, lowest cost |
| 1 | | Broad questions ("what was I doing?", "which apps?") |
| 2 | | Need specific content |
| 3 | or | UI structure, buttons, links |
| 4 | (PNG) | Visual context needed |
Decision tree:
- "What was I doing?" → Step 1 only
- "Summarize my meeting" → Step 2 with
, NO q param. Addcontent_type=audio
for screen context.content_type=all - "How long on X?" → Step 1 (
has/activity-summary
)active_minutes - "Which apps today?" → Step 1 (do NOT use frame counts or SQL)
- "What button did I click?" → Step 3 (
with role=AXButton)/elements - "Show me what I saw" → Step 2 (find frame_id) → Step 4
Critical Rules
- ALWAYS include
— queries without time bounds WILL timeoutstart_time - Start with 1-2 hour ranges — expand only if no results
- Use
when user mentions a specific appapp_name - Keep
low (5-10) initiallylimit - "recent" = 30 min. "today" = since midnight. "yesterday" = yesterday's range
- If timeout, narrow the time range
Response Format
{ "data": [ {"type": "OCR", "content": {"frame_id": 12345, "text": "...", "timestamp": "...", "app_name": "Chrome", "window_name": "..."}}, {"type": "Audio", "content": {"chunk_id": 678, "transcription": "...", "timestamp": "...", "speaker": {"name": "John"}}}, {"type": "UI", "content": {"id": 999, "text": "Clicked 'Submit'", "timestamp": "...", "app_name": "Safari"}} ], "pagination": {"limit": 10, "offset": 0, "total": 42} }
Note: The
type label is used for all screen text results, including text captured via the accessibility tree. Most screen text comes from accessibility, not OCR."OCR"
2. Activity Summary — GET /activity-summary
GET /activity-summarycurl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/activity-summary?start_time=1h%20ago&end_time=now"
Returns a rich overview with:
- apps: usage with
, first/last seenactive_minutes - windows: every distinct window/tab with title,
, and time spent — this is the most valuable field, it tells you exactly what the user was working onbrowser_url - key_texts: one representative text snippet per window context (user input fields prioritized over static page text)
- audio_summary.top_transcriptions: actual transcription text with speaker and timestamp (not just counts)
This is usually enough to answer "what was I doing?" without further searches. Only drill into
/search if you need verbatim quotes or specific content.
3. Elements — GET /elements
GET /elementsLightweight FTS search across UI elements (~100-500 bytes each vs 5-20KB from
/search).
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/elements?q=Submit&start_time=1h%20ago&limit=10"
Parameters:
q, frame_id, source (accessibility|ocr), role, start_time, end_time, app_name, limit, offset.
Frame Context — GET /frames/{id}/context
GET /frames/{id}/contextReturns accessibility text, parsed nodes, and extracted URLs for a frame.
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/frames/6789/context"
Common Roles (platform-specific)
Roles are not normalized across platforms. Use the correct format for the user's OS:
| Concept | macOS | Windows | Linux |
|---|---|---|---|
| Button | | | |
| Static text | | | |
| Link | | | |
| Text field | | | |
| Text area | | | |
| Menu item | | | |
| Checkbox | | | |
| Group | | | |
| Web area | | | |
| Heading | | | |
| Tab | | | |
| List item | | | |
OCR-only roles (fallback when accessibility unavailable):
line, word, block, paragraph, page
4. Frames (Screenshots) — GET /frames/{frame_id}
GET /frames/{frame_id}curl -o /tmp/frame.png "http://localhost:3030/frames/12345"
Returns raw PNG. Never fetch more than 2-3 frames per query (~1000-2000 tokens each).
5. Media Export — POST /frames/export
POST /frames/exportcurl -X POST http://localhost:3030/frames/export \ -H "Content-Type: application/json" \ -d '{"start_time": "5m ago", "end_time": "now", "fps": 1.0}'
Fields:
start_time, end_time (or frame_ids array), fps (default 1.0). Max 10,000 frames.
FPS guidelines: 5min→1.0, 30min→0.5, 1h→0.2, 2h+→0.1
Returns
{"file_path": "...", "frame_count": N, "duration_secs": N}. Show path as inline code block for playback.
Audio & ffmpeg
Audio files from search results (
file_path). Common operations:
ffmpeg -y -i /path/to/audio.mp4 -q:a 2 ~/.screenpipe/exports/output.mp3 # convert ffmpeg -y -i input.mp4 -ss 00:01:00 -to 00:05:00 -q:a 2 clip.mp3 # trim ffmpeg -y -i input.mp4 -filter:v "setpts=0.5*PTS" -an fast.mp4 # speed 2x ffmpeg -y -i input.mp4 -t 10 -vf "fps=10,scale=640:-1" output.gif # GIF
Always use
-y, save to ~/.screenpipe/exports/.
6. Retranscribe — POST /audio/retranscribe
POST /audio/retranscribecurl -X POST http://localhost:3030/audio/retranscribe \ -H "Content-Type: application/json" \ -d '{"start": "1h ago", "end": "now"}'
Optional:
engine (whisper-large-v3-turbo|whisper-large-v3|deepgram|qwen3-asr), vocabulary (array of {"word": "...", "replacement": "..."} for bias/replacement), prompt (topic context for Whisper).
Keep ranges short (1h max). Show old vs new transcription.
7. Raw SQL — POST /raw_sql
POST /raw_sqlcurl -X POST http://localhost:3030/raw_sql \ -H "Content-Type: application/json" \ -d '{"query": "SELECT ... LIMIT 100"}'
Rules: Every SELECT needs LIMIT. Always filter by time. Read-only. Use
datetime('now', '-24 hours') for time math.
WARNING: Do NOT use frame counts for time estimates — frames are event-driven, not fixed-interval. Use
/activity-summary for screen time.
Schema
| Table | Key Columns | Time Column |
|---|---|---|
| , , , | |
| , , | join via |
| , , , | join via |
| , , , | |
| | |
| , | — |
| , , , | |
| , , , | |
| , , , | |
| , , , | |
Example Queries
-- Most used apps (last 24h) SELECT app_name, COUNT(*) as frames FROM frames WHERE timestamp > datetime('now', '-24 hours') AND app_name IS NOT NULL GROUP BY app_name ORDER BY frames DESC LIMIT 20 -- Most visited domains SELECT CASE WHEN INSTR(SUBSTR(browser_url, INSTR(browser_url, '://') + 3), '/') > 0 THEN SUBSTR(SUBSTR(browser_url, INSTR(browser_url, '://') + 3), 1, INSTR(SUBSTR(browser_url, INSTR(browser_url, '://') + 3), '/') - 1) ELSE SUBSTR(browser_url, INSTR(browser_url, '://') + 3) END as domain, COUNT(*) as visits FROM frames WHERE timestamp > datetime('now', '-24 hours') AND browser_url IS NOT NULL GROUP BY domain ORDER BY visits DESC LIMIT 20 -- Speaker stats SELECT COALESCE(NULLIF(s.name, ''), 'Unknown') as speaker, COUNT(*) as segments FROM audio_transcriptions at LEFT JOIN speakers s ON at.speaker_id = s.id WHERE at.timestamp > datetime('now', '-24 hours') GROUP BY at.speaker_id ORDER BY segments DESC LIMIT 20 -- Context switches per hour SELECT strftime('%H:00', timestamp) as hour, COUNT(*) as switches FROM ui_events WHERE event_type = 'app_switch' AND timestamp > datetime('now', '-24 hours') GROUP BY hour ORDER BY hour LIMIT 24
Common patterns:
GROUP BY date(timestamp) (daily), GROUP BY strftime('%H:00', timestamp) (hourly), HAVING frames > 5 (filter noise).
8. Connections — GET /connections
GET /connections# List all integrations (Telegram, Slack, Discord, Email, Todoist, Teams) curl http://localhost:3030/connections # Get credentials for a connected service curl http://localhost:3030/connections/telegram
Returns credentials to use with service APIs directly:
- Telegram:
+bot_token
→chat_idPOST https://api.telegram.org/bot{token}/sendMessage - Slack:
→webhook_url
withPOST {webhook_url}{"text": "..."} - Discord:
→webhook_url
withPOST {webhook_url}{"content": "..."} - Todoist:
→api_token
with Bearer authPOST https://api.todoist.com/api/v1/tasks - Teams:
→webhook_url
withPOST {webhook_url}{"text": "..."} - Email:
,smtp_host
,smtp_port
,smtp_user
,smtp_passfrom_address
If not connected, tell user to set up in Settings > Connections.
9. Meetings — GET /meetings
GET /meetingscurl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings?start_time=1d%20ago&end_time=now&limit=10&offset=0" curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings/42"
Returns detected meetings (from calendar, app detection, window titles, UI elements, multi-speaker audio).
| Field | Type | Description |
|---|---|---|
| integer | Meeting ID |
| ISO 8601 | Start time |
| ISO 8601? | End time (null if ongoing) |
| string | App (zoom, teams, meet, etc.) |
| string? | Meeting title |
| string? | Attendees |
| string | How detected (, , , etc.) |
Also available via raw SQL:
SELECT * FROM meetings WHERE meeting_start > datetime('now', '-24 hours') LIMIT 20
10. Speakers — Management & Reassignment
# Search speakers by name curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/speakers/search?name=John" # Get unnamed speakers (for labeling) curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/speakers/unnamed?limit=20&offset=0" # Get speakers similar to a given speaker (by voice embedding) curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/speakers/similar?speaker_id=29&limit=5" # Update speaker name/metadata curl -X POST http://localhost:3030/speakers/update \ -H "Content-Type: application/json" \ -d '{"id": 29, "name": "Jordan"}' # Reassign speaker for an audio chunk (propagates to similar chunks by default) curl -X POST http://localhost:3030/speakers/reassign \ -H "Content-Type: application/json" \ -d '{"audio_chunk_id": 456, "new_speaker_name": "Jordan", "propagate_similar": true}' # Returns: new_speaker_id, transcriptions_updated, old_assignments (for undo) # Undo a speaker reassignment curl -X POST http://localhost:3030/speakers/undo-reassign \ -H "Content-Type: application/json" \ -d '{"old_assignments": [{"transcription_id": 1, "old_speaker_id": 29}]}' # Merge two speakers (keeps one, merges the other into it) curl -X POST http://localhost:3030/speakers/merge \ -H "Content-Type: application/json" \ -d '{"speaker_to_keep_id": 5, "speaker_to_merge_id": 29}' # Mark speaker as hallucination (false detection) curl -X POST http://localhost:3030/speakers/hallucination \ -H "Content-Type: application/json" \ -d '{"speaker_id": 29}' # Delete a speaker (also removes associated audio chunk files) curl -X POST http://localhost:3030/speakers/delete \ -H "Content-Type: application/json" \ -d '{"id": 29}'
Speaker Reassignment Workflow
When the user says "that was actually Jordan, not Karishma":
- Search audio results to find the
for the misidentified audiochunk_id - Call
withPOST /speakers/reassign
andaudio_chunk_idnew_speaker_name - With
(default), it also fixes similar-sounding chunkspropagate_similar: true
11. Memories — High-Signal Persistent Knowledge
Memories are the highest-signal data source in screenpipe. They contain curated facts, user preferences, decisions, and project context — distilled from hours of screen/audio data. Always check memories when answering questions or building context.
When to Query Memories
Query memories FIRST (before or alongside
) when:/search
- The user asks about preferences, decisions, or past context
- You need background on a project, person, or workflow
- You're generating a summary, recommendation, or action plan
- You're unsure about user preferences or past decisions
- Any task where historical context would improve the output
Rule: If you're calling
, also call /search
in parallel. Memories provide the "why" behind the raw screen data. Search gives you what happened; memories tell you what matters./memories
API
# Search memories (FTS) — do this often! curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?q=preference&limit=20" # List recent memories (high importance first) curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?min_importance=0.5&limit=20" # Filter by source or tags curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?source=user&tags=project&limit=20" # Create a memory curl -X POST http://localhost:3030/memories \ -H "Content-Type: application/json" \ -d '{"content": "User prefers dark mode", "source": "user", "tags": ["preference", "ui"], "importance": 0.7}' # Update a memory curl -X PUT http://localhost:3030/memories/1 \ -H "Content-Type: application/json" \ -d '{"content": "User prefers dark mode in all apps", "importance": 0.8}' # Delete a memory curl -X DELETE http://localhost:3030/memories/1
Parameters for
GET /memories: q (FTS search), source, tags, min_importance, start_time, end_time, limit, offset.
Memories also appear in
/search?content_type=memory.
Creating Memories
When you learn something important about the user (preferences, decisions, project context), store it as a memory. Use
importance 0.0-1.0 to rank signal. Only store genuinely useful long-lived facts, not transient observations.
12. Notifications — POST http://localhost:11435/notify
POST http://localhost:11435/notifySend a notification to the screenpipe desktop UI. This uses the Tauri sidecar server (port 11435), not the main API (port 3030).
The notification body supports markdown:
**bold**, `inline code`, and [link text](url). Links can be web URLs, file paths, or screenpipe deeplinks.
# Simple notification curl -X POST http://localhost:11435/notify \ -H "Content-Type: application/json" \ -d '{"title": "3 new voice memos", "body": "found recordings from today"}' # Markdown body with links curl -X POST http://localhost:11435/notify \ -H "Content-Type: application/json" \ -d '{"title": "Meeting summary", "body": "**Q3 Planning** notes saved\n\nopen [meeting notes](~/Documents/notes/q3.md) or view [recording](screenpipe://timeline)"}' # Link to a local file (absolute path or ~ path) curl -X POST http://localhost:11435/notify \ -H "Content-Type: application/json" \ -d '{"title": "Export complete", "body": "saved to [report.csv](~/Downloads/report.csv)"}' # With action buttons curl -X POST http://localhost:11435/notify \ -H "Content-Type: application/json" \ -d '{"title": "Meeting summary", "body": "**Q3 Planning**\n- Budget approved", "actions": [{"id": "view", "label": "view", "type": "deeplink", "url": "screenpipe://timeline"}, {"id": "skip", "label": "skip", "type": "dismiss"}]}' # Custom auto-dismiss (5 seconds) curl -X POST http://localhost:11435/notify \ -H "Content-Type: application/json" \ -d '{"title": "Saved", "body": "Note saved", "timeout": 5000}'
| Field | Type | Required | Description |
|---|---|---|---|
| string | Yes | Notification title |
| string | Yes | Markdown body (, , ) |
| string | No | Category (default "pipe") |
| integer | No | Auto-dismiss in ms (default 20000) |
| integer | No | Alias for timeout |
| array | No | Action buttons |
Supported link types in body markdown:
- Web URLs:
— opens in browser[docs](https://docs.screenpi.pe) - File paths:
or[notes](~/notes/file.md)
— opens in default app[log](/var/log/app.log) - Deeplinks:
— navigates within screenpipe[timeline](screenpipe://timeline)
Returns
{"success": true, "message": "Notification sent successfully"}.
13. Other Endpoints
curl http://localhost:3030/health # Health check curl http://localhost:3030/audio/list # Audio devices curl http://localhost:3030/vision/list # Monitors
Deep Links
Reference specific moments with clickable links:
[10:30 AM — Chrome](screenpipe://frame/12345) # screen text results (use frame_id) [meeting at 3pm](screenpipe://timeline?timestamp=ISO8601) # Audio results (use timestamp)
Only use IDs/timestamps from actual search results. Never fabricate.
Showing Videos
Show
file_path from search results as inline code for playable video:
`/Users/name/.screenpipe/data/monitor_1_2024-01-15_10-30-00.mp4`