Claude-skill-registry embedding-models
Embedding model configurations and cost calculators
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/embedding-models" ~/.claude/skills/majiayu000-claude-skill-registry-embedding-models && rm -rf "$T"
manifest:
skills/data/embedding-models/SKILL.mdsource content
Embedding Models Skill
Embedding model selection, configuration, and cost optimization for RAG pipelines.
Use When
- Selecting embedding models for vector search
- Configuring OpenAI, Cohere, or HuggingFace embeddings
- Calculating embedding generation costs
- Optimizing embedding performance vs cost tradeoffs
- Setting up local vs cloud embedding models
- Implementing embedding caching strategies
- User mentions: "embeddings", "vector models", "embedding costs", "semantic search models"
Model Selection Guide
Commercial Models
OpenAI Embeddings:
- 1536 dims, $0.02/1M tokens, balanced performancetext-embedding-3-small
- 3072 dims, $0.13/1M tokens, highest qualitytext-embedding-3-large
- 1536 dims, $0.10/1M tokens, legacy modeltext-embedding-ada-002
Cohere Embeddings:
- 1024 dims, multilingual supportembed-english-v3.0
- 384 dims, faster/cheaperembed-english-light-v3.0
- 1024 dims, 100+ languagesembed-multilingual-v3.0
Open Source Models (HuggingFace)
Sentence Transformers:
- 384 dims, 80MB, fast and efficientall-MiniLM-L6-v2
- 768 dims, 420MB, high qualityall-mpnet-base-v2
- 768 dims, optimized for Q&Amulti-qa-mpnet-base-dot-v1
- 768 dims, 50+ languagesparaphrase-multilingual-mpnet-base-v2
Specialized Models:
- 384 dims, SOTA small modelBAAI/bge-small-en-v1.5
- 768 dims, excellent retrievalBAAI/bge-base-en-v1.5
- 1024 dims, top performanceBAAI/bge-large-en-v1.5
- 768 dims, strong general purposeintfloat/e5-base-v2
Cost Calculator
Use the cost calculator script to estimate embedding costs:
# Calculate costs for different models and volumes python scripts/calculate-embedding-costs.py \ --documents 100000 \ --avg-tokens 500 \ --model text-embedding-3-small # Compare multiple models python scripts/calculate-embedding-costs.py \ --documents 100000 \ --avg-tokens 500 \ --compare
Setup Scripts
OpenAI Embeddings
bash scripts/setup-openai-embeddings.sh
Configures OpenAI embedding client with API key management and retry logic.
HuggingFace Embeddings
bash scripts/setup-huggingface-embeddings.sh
Downloads and configures sentence-transformers models locally.
Cohere Embeddings
bash scripts/setup-cohere-embeddings.sh
Sets up Cohere embedding client with API credentials.
Configuration Templates
OpenAI Configuration
# templates/openai-embedding-config.py from openai import OpenAI client = OpenAI(api_key="your-key") embeddings = client.embeddings.create( model="text-embedding-3-small", input=["Your text here"] )
HuggingFace Configuration
# templates/huggingface-embedding-config.py from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(["Your text here"])
Custom Model Template
# templates/custom-embedding-model.py # Wrapper for any embedding model with consistent interface
Optimization Strategies
Cost Optimization:
- Use smaller models for high-volume applications
- Implement embedding caching (see examples/embedding-cache.py)
- Batch embedding generation (see examples/batch-embedding-generation.py)
- Consider local models for sensitive data
Performance Optimization:
- Use GPU acceleration for local models
- Batch processing for throughput
- Dimension reduction for storage/speed
- Model distillation for faster inference
Model Comparison Matrix
| Model | Dimensions | Size | Speed | Quality | Cost |
|---|---|---|---|---|---|
| text-embedding-3-small | 1536 | API | Fast | Good | $0.02/1M |
| text-embedding-3-large | 3072 | API | Medium | Excellent | $0.13/1M |
| all-MiniLM-L6-v2 | 384 | 80MB | Very Fast | Good | Free |
| all-mpnet-base-v2 | 768 | 420MB | Fast | Excellent | Free |
| bge-base-en-v1.5 | 768 | 420MB | Fast | Excellent | Free |
| embed-english-v3.0 | 1024 | API | Fast | Excellent | $0.10/1M |
Examples
Batch Embedding Generation:
# examples/batch-embedding-generation.py # Process large document collections efficiently
Embedding Cache:
# examples/embedding-cache.py # Cache embeddings to avoid redundant API calls
Decision Framework
Use OpenAI when:
- Need highest quality embeddings
- Low to medium volume (<10M tokens/month)
- Prefer managed service over self-hosting
- Working with latest models
Use Cohere when:
- Need multilingual support
- Require production SLA
- Want embedding customization
- Need both embedding and reranking
Use HuggingFace/Local when:
- High volume (>10M tokens/month)
- Data privacy requirements
- Have GPU infrastructure
- Cost optimization priority
- Offline/air-gapped environments
References
- Sentence Transformers: https://www.sbert.net/
- OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings
- Cohere Embeddings: https://docs.cohere.com/docs/embeddings
- MTEB Leaderboard: https://huggingface.co/spaces/mteb/leaderboard