install
source · Clone the upstream repo
git clone https://github.com/kreuzberg-dev/kreuzberg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/kreuzberg-dev/kreuzberg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.ai-rulez/skills/mime-detection-routing" ~/.claude/skills/kreuzberg-dev-kreuzberg-mime-detection-routing && rm -rf "$T"
manifest:
.ai-rulez/skills/mime-detection-routing/SKILL.mdsource content
priority: high
MIME Detection & Routing
Detection Flow
Extension → EXT_TO_MIME map → validate → Registry lookup → Extractor
Key Functions
| Function | Location | Purpose |
|---|---|---|
| | Extension + optional content inspection |
| | Magic number detection (infer crate) |
| | Check if any extractor supports it |
Extension Mapping
118+ extensions mapped in
EXT_TO_MIME (core/mime.rs). Case-insensitive.
Key mappings:
.pdf → application/pdf, .docx → application/vnd.openxmlformats-officedocument.wordprocessingml.document, .xlsx → spreadsheet variant, .png/.jpg → image/*
Registry Selection
// In core/extractor/bytes.rs fn select_extractor_for_mime(mime_type: &str) -> Result<Arc<dyn DocumentExtractor>> { let registry = get_document_extractor_registry(); let registry_guard = registry.read()?; registry_guard.get_for_mime_type(mime_type) .ok_or_else(|| KreuzbergError::UnsupportedFormat(mime_type.into())) }
Selects highest-priority extractor registered for that MIME type.
Adding New MIME Types
- Add extension mapping:
inm.insert("ext", "application/x-new");core/mime.rs - Implement
withDocumentExtractor
returning the MIMEsupported_mime_types() - Register in
register_default_extractors()
Wildcard Support
Extractors can register for MIME type families:
"image/*" matches image/png, image/jpeg, etc.
Critical Rules
- Always
before extractionvalidate_mime_type() - Extension mapping is case-insensitive
- Content inspection (infer crate) is fallback for extension-less files
- Registry validation is final authority on supported types