Kreuzberg wasm-constraints

wasm constraints

install
source · Clone the upstream repo
git clone https://github.com/kreuzberg-dev/kreuzberg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/kreuzberg-dev/kreuzberg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.ai-rulez/skills/wasm-constraints" ~/.claude/skills/kreuzberg-dev-kreuzberg-wasm-constraints && rm -rf "$T"
manifest: .ai-rulez/skills/wasm-constraints/SKILL.md
source content

priority: high

WASM Build Constraints

Overview

WASM target in

crates/kreuzberg-wasm/
. Uses wasm-bindgen with sync-only internal APIs.

Feature Flags

[features]
wasm-target = ["pdf", "html", "xml", "email", "language-detection", "chunking", "quality", "office"]
wasm-threads = ["dep:wasm-bindgen-rayon"]  # Optional

Critical Constraints

1. No Tokio Runtime

All operations must be synchronous internally. Use

#[cfg(not(feature = "tokio-runtime"))]
paths.

2. SyncExtractor Required

Every WASM-compatible extractor MUST implement

SyncExtractor
:

impl SyncExtractor for MyExtractor {
    fn extract_sync(&self, content: &[u8], mime_type: &str, config: &ExtractionConfig)
        -> Result<ExtractionResult> { /* sync implementation */ }
}

impl DocumentExtractor for MyExtractor {
    fn as_sync_extractor(&self) -> Option<&dyn SyncExtractor> {
        Some(self)  // MUST return Some for WASM
    }
}

3. HTML Size Limit

const MAX_HTML_SIZE: usize = 2 * 1024 * 1024;  // 2MB - stack constraint

4. PDFium Initialization (from JS)

import init, { initialize_pdfium_render } from './kreuzberg_wasm.js';
const wasm = await init();
const pdfium = await pdfiumModule();
initialize_pdfium_render(pdfium, wasm, false);  // REQUIRED for PDF

Build Config

[lib]
crate-type = ["cdylib", "rlib"]

[profile.release.package.kreuzberg-wasm]
opt-level = "z"       # Size optimization
codegen-units = 1

API Pattern

#[wasm_bindgen]
pub async fn extract_from_bytes(content: Vec<u8>, config: JsValue) -> Result<JsValue, JsValue> {
    let config: ExtractionConfig = serde_wasm_bindgen::from_value(config)?;
    let result = extract_bytes_sync(&content, mime_type, &config)?;
    Ok(serde_wasm_bindgen::to_value(&result)?)
}

Functions can be

async
for JS compatibility, but internal extraction is sync.

Critical Rules

  1. No tokio — all operations synchronous
  2. Implement SyncExtractor for all WASM-compatible extractors
  3. HTML limited to 2MB due to stack constraints
  4. PDFium requires manual JS initialization
  5. Size optimization via
    opt-level = "z"
  6. Feature gate with
    #[cfg(target_arch = "wasm32")]