Kreuzberg plugin-architecture-patterns

plugin architecture patterns

install
source · Clone the upstream repo
git clone https://github.com/kreuzberg-dev/kreuzberg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/kreuzberg-dev/kreuzberg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.ai-rulez/skills/plugin-architecture-patterns" ~/.claude/skills/kreuzberg-dev-kreuzberg-plugin-architecture-patterns && rm -rf "$T"
manifest: .ai-rulez/skills/plugin-architecture-patterns/SKILL.md
source content

priority: critical

Plugin Architecture & Registration

Plugin Types

TypeTraitLocation
Document Extractor
DocumentExtractor: Plugin
plugins/extractor/trait.rs
OCR Backend
OcrBackend: Plugin
plugins/ocr/trait.rs
Post Processor
PostProcessor: Plugin
plugins/processor/trait.rs
Validator
Validator: Plugin
plugins/validator/trait.rs

DocumentExtractor Implementation

use crate::plugins::{DocumentExtractor, Plugin};
use async_trait::async_trait;

pub struct MyExtractor;

impl Plugin for MyExtractor {
    fn name(&self) -> &str { "my-extractor" }
    fn version(&self) -> String { env!("CARGO_PKG_VERSION").to_string() }
}

#[async_trait]
impl DocumentExtractor for MyExtractor {
    async fn extract_bytes(&self, content: &[u8], mime_type: &str, config: &ExtractionConfig)
        -> Result<ExtractionResult> { /* ... */ }

    fn supported_mime_types(&self) -> &[&str] { &["application/x-custom"] }
    fn priority(&self) -> i32 { 50 }

    // WASM support (optional)
    fn as_sync_extractor(&self) -> Option<&dyn SyncExtractor> { None }
}

Priority System

RangeUse
0-25Fallback/low-quality
26-49Alternative extractors
50Default (built-in)
51-75Premium/enhanced
76-100Specialized/high-priority

Registry selects highest priority extractor for each MIME type. Override built-ins with priority > 50.

Registration

// In extractors/mod.rs → register_default_extractors()
let registry = get_document_extractor_registry();
let mut registry = registry.write()
    .map_err(|e| KreuzbergError::Other(format!("Registry lock poisoned: {}", e)))?;
registry.register(Arc::new(MyExtractor::new()))?;

Feature-Gated Registration

#[cfg(feature = "office")]
{
    registry.register(Arc::new(DocxExtractor::new()))?;
    registry.register(Arc::new(PptxExtractor::new()))?;
}

PostProcessor Pattern

impl PostProcessor for MyProcessor {
    async fn process(&self, result: &mut ExtractionResult, config: &ExtractionConfig)
        -> Result<()> {
        result.content = process_content(&result.content);
        Ok(())
    }
    fn stage(&self) -> ProcessorStage { ProcessorStage::Middle }
}

Stages:

Early
Middle
Late
. Failures isolated (don't block others).

Critical Rules

  1. All plugins MUST be
    Send + Sync
  2. Feature gate with
    #[cfg(feature = "...")]
    for optional formats
  3. Use
    #[async_trait]
    for
    DocumentExtractor
  4. Initialization via
    ensure_initialized()
    (lazy, called before first extraction)
  5. Plugin names: kebab-case (e.g.,
    "pdf-extractor"
    )