Claude-skill-registry gem
Multimodal AI processing using Google Gemini. Use for analyzing PDFs, images, videos, YouTube links, and other large documents. Ideal when you need to extract information from files that require vision or multimodal understanding.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/gem" ~/.claude/skills/majiayu000-claude-skill-registry-gem && rm -rf "$T"
manifest:
skills/data/gem/SKILL.mdsource content
Gemini Multimodal Tool
Use the
ai-gem CLI tool for multimodal AI processing via Google's Gemini API.
Usage
# Text queries ai-gem "Write a haiku about Python programming" # Analyze documents ai-gem "Summarize this document" document.pdf # Analyze images ai-gem "What's in this image?" photo.jpg # Process YouTube videos ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID" # Compare multiple files ai-gem "Compare these files" file1.pdf file2.png # Web search ai-gem "Current AI news" --search
Requirements
environment variable must be setGEMINI_API_KEY- The
package must be installed:hamelpip install hamel
Supported Input Types
- PDFs
- Images (PNG, JPEG, GIF, WebP)
- Videos (MP4, etc.)
- YouTube URLs
- Plain text files
- Multiple files for comparison