Sherlock-ai-plugin genimg-gemini-web

Image generation skill using Gemini Web. Generates images from text prompts via Google Gemini. Also supports text generation. Use as the image generation backend for other skills like cover-image, xhs-images, article-illustrator.

install

source · Clone the upstream repo

git clone https://github.com/proyecto26/sherlock-ai-plugin

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/proyecto26/sherlock-ai-plugin "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genimg-gemini-web" ~/.claude/skills/proyecto26-sherlock-ai-plugin-genimg-gemini-web && rm -rf "$T"

manifest: skills/genimg-gemini-web/SKILL.md

source content

Gemini Web Client

Supports:

Text generation
Image generation (download + save)
Reference image upload (attach images for vision tasks)
Multi-turn conversations within the same executor instance (
```
keepSession
```
)
Experimental video generation (
```
generateVideo
```
) — Gemini may return an async placeholder; download might require Gemini web UI

Quick start

npx -y bun scripts/main.ts "Hello, Gemini"
npx -y bun scripts/main.ts --prompt "Explain quantum computing"
npx -y bun scripts/main.ts --prompt "A cute cat" --image cat.png
npx -y bun scripts/main.ts --promptfiles system.md content.md --image out.png

# Multi-turn conversation (agent generates unique sessionId)
npx -y bun scripts/main.ts "Remember this: 42" --sessionId my-unique-id-123
npx -y bun scripts/main.ts "What number?" --sessionId my-unique-id-123

Executor options (programmatic)

This skill is typically consumed via

createGeminiWebExecutor(geminiOptions)

(see

scripts/executor.ts

Key options in

GeminiWebOptions

```
referenceImages?: string | string[]
```
Upload local images as references (vision input).
```
keepSession?: boolean
```
Reuse Gemini
```
chatMetadata
```
to continue the same conversation across calls (required if you want reference images to persist across multiple messages).
```
generateVideo?: string
```
Generate a video and (best-effort) download to the given path. Gemini may return
```
video_gen_chip
```
(async); in that case you must open Gemini web UI to download the result.

Notes:

```
generateVideo
```
cannot be combined with
```
generateImage
```
/
```
editImage
```
.
When
```
keepSession=true
```
and
```
referenceImages
```
is set, reference images are uploaded once per executor instance.

Commands

Text generation

# Simple prompt (positional)
npx -y bun scripts/main.ts "Your prompt here"

# Explicit prompt flag
npx -y bun scripts/main.ts --prompt "Your prompt here"
npx -y bun scripts/main.ts -p "Your prompt here"

# With model selection
npx -y bun scripts/main.ts -p "Hello" -m gemini-2.5-pro

# Pipe from stdin
echo "Summarize this" | npx -y bun scripts/main.ts

Image generation

# Generate image with default path (./generated.png)
npx -y bun scripts/main.ts --prompt "A sunset over mountains" --image

# Generate image with custom path
npx -y bun scripts/main.ts --prompt "A cute robot" --image robot.png

# Shorthand
npx -y bun scripts/main.ts "A dragon" --image=dragon.png

Output formats

# Plain text (default)
npx -y bun scripts/main.ts "Hello"

# JSON output
npx -y bun scripts/main.ts "Hello" --json

Options

Option	Description
`--prompt <text>` , `-p`	Prompt text
`--promptfiles <files...>`	Read prompt from files (concatenated in order)
`--model <id>` , `-m`	Model: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash
`--image [path]`	Generate image, save to path (default: generated.png)
`--sessionId <id>`	Session ID for multi-turn conversation (agent generates unique ID)
`--list-sessions`	List saved sessions (max 100, sorted by update time)
`--json`	Output as JSON
`--login`	Refresh cookies only, then exit
`--cookie-path <path>`	Custom cookie file path
`--profile-dir <path>`	Chrome profile directory
`--help` , `-h`	Show help

CLI note:

scripts/main.ts

supports text generation, image generation, and multi-turn conversations via

--sessionId

. Reference images and video generation are exposed via the executor API.

Models

```
gemini-3-pro
```
- Default, latest model
```
gemini-2.5-pro
```
- Previous generation pro
```
gemini-2.5-flash
```
- Fast, lightweight

Authentication

First run opens Chrome to authenticate with Google. Cookies are cached for subsequent runs.

# Force cookie refresh
npx -y bun scripts/main.ts --login

Environment variables

Variable	Description
`GEMINI_WEB_DATA_DIR`	Data directory
`GEMINI_WEB_COOKIE_PATH`	Cookie file path
`GEMINI_WEB_CHROME_PROFILE_DIR`	Chrome profile directory
`GEMINI_WEB_CHROME_PATH`	Chrome executable path

Examples

Generate text response

npx -y bun scripts/main.ts "What is the capital of France?"

Generate image

npx -y bun scripts/main.ts "A photorealistic image of a golden retriever puppy" --image puppy.png

Get JSON output for parsing

npx -y bun scripts/main.ts "Hello" --json | jq '.text'

Generate image from prompt files

# Concatenate system.md + content.md as prompt
npx -y bun scripts/main.ts --promptfiles system.md content.md --image output.png

Multi-turn conversation

# Start a session with unique ID (agent generates this)
npx -y bun scripts/main.ts "You are a helpful math tutor." --sessionId task-abc123

# Continue the conversation (remembers context)
npx -y bun scripts/main.ts "What is 2+2?" --sessionId task-abc123
npx -y bun scripts/main.ts "Now multiply that by 10" --sessionId task-abc123

# List recent sessions (max 100, sorted by update time)
npx -y bun scripts/main.ts --list-sessions

Session files are stored in

~/Library/Application Support/genimg-skills/gemini-web/sessions/<id>.json

and contain:

```
id
```
: Session ID
```
metadata
```
: Gemini chat metadata for continuation

messages

: Array of

{role, content, timestamp, error?}

```
createdAt
```
,
```
updatedAt
```
: Timestamps