Claude-skill-registry add-embedding-support
Add Qdrant embedding support to v3 WordPress components for RAG chatbot. Implements component-level content chunking for searchable, structured embeddings. Use when adding embedding to new or existing v3 components.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/add-embedding-support" ~/.claude/skills/majiayu000-claude-skill-registry-add-embedding-support && rm -rf "$T"
skills/data/add-embedding-support/SKILL.mdAdd Embedding Support Skill
You are helping add Qdrant embedding support to WordPress v3 components. This enables component content to be indexed and searched via a RAG-based chatbot powered by Claude's API.
System Overview
The embedding system:
- Chunks at component level: Each component becomes one or more embedding chunks
- Avoids sub-component loading: Write extraction code directly in the component class
- Supports sections: Complex components add multiple sections (sub-chunks) per instance
- Tracks metadata: Links, dates, and custom metadata stored separately
- Respects skip markers: Components can opt-out via
ComponentEmbeddingSkipAwareInterface
How It Works
- CLI command
runswp vendi embedding:generate - Sets global constant
toVENDI_RENDER_CONTEXTRenderingContextEnum::EMBEDDING - Loads each component via
vendi_load_component_v3() - Template detects context and returns component instance (no HTML rendering)
- Component's
method extracts structured datagetEmbedding()
DTO formats data into JSON chunks for QdrantComponentEmbedding
Output Format
Each component produces a JSON object like this:
{ "content": "Heading: Ask a Researcher\nBody: Are you a CRNA with research questions?\nLinks: Contact us", "metadata": { "type": "page", "url": "https://example.com/page/", "created": "2022-11-29T21:01:08+00:00", "updated": "2024-03-07T09:07:06+00:00", "links": [ { "text": "Contact us", "url": "https://example.com/contact/" } ], "component_type": "content_callout_full_width" }, "id": "660-3" }
Component Type Classification
Embeddable Components
- Implements
ComponentEmbeddingAwareInterface - Provides
methodgetEmbedding() - Content is indexed for chatbot
Skippable Components
- Implements
(marker interface)ComponentEmbeddingSkipAwareInterface - No
method neededgetEmbedding() - Ignored during embedding generation
- Use for: ads, navigation, forms, decorative elements
Simple Components
- Single chunk with heading and/or body
- No repeater fields
- Auto-extraction via interfaces
Complex Components
- Multiple sections from repeater/flexible content
- Each item becomes a separate section (sub-chunk)
- May include links/CTAs tracked in metadata
Implementation Patterns
Pattern 1: Simple Component (Single Chunk)
When to use: Component has just heading and/or body copy, no repeater fields
Choosing the Right Interfaces
IMPORTANT: Inspect the actual template file to determine which interfaces to implement:
-
- Use when template displays a component-level heading (outside loops)PrimaryHeadingInterface- Example:
at the top level<h2><?php esc_html_e(get_sub_field('headline')); ?></h2> - NOT for headings inside repeater loops
- Example:
-
- Use when template displays component-level body/intro copy (outside loops)PrimaryCopyInterface- Example:
before any repeaters<?php echo wp_kses_post(get_sub_field('intro_copy')); ?> - NOT for copy inside repeater loops
- Example:
The interfaces should map to what actually exists in the template structure.
Required Interfaces
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface; use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; // If template has top-level heading use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; // If template has top-level copy use Vendi\Theme\DTO\Embedding\ComponentEmbedding; use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface;
Class Implementation
class simple_component extends BaseComponent implements ComponentEmbeddingAwareInterface, PrimaryHeadingInterface, // Only if template has top-level heading PrimaryCopyInterface // Only if template has top-level copy { public function getEmbedding(): ?ComponentEmbeddingInterface { return ComponentEmbedding::fromComponent($this); } public function getPrimaryHeadingText(): ?string { // Return the field that corresponds to the top-level heading in template return get_sub_field('headline'); } public function getPrimaryCopy(): ?string { // Return the field that corresponds to the top-level copy in template return get_sub_field('copy'); } }
Output
Heading: [from getPrimaryHeadingText() if interface implemented] Body: [from getPrimaryCopy() if interface implemented]
Key Points:
- Inspect template first to determine which interfaces are needed
auto-extracts heading and body via interfacesfromComponent()- Single chunk per component instance
- No manual section creation needed
- Don't guess at structure - base decision on actual template code
Pattern 2: Skippable Component (No Embedding)
When to use: Ads, navigation, forms, decorative/visual-only elements
Required Interface
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingSkipAwareInterface;
Class Implementation
class ad_component extends VendiComponent implements ComponentEmbeddingSkipAwareInterface { // No getEmbedding() method needed // Component completely ignored during embedding generation }
Key Points:
- Empty marker interface
- No embedding logic required
- Still add template boilerplate (see Template Requirements)
Pattern 3: Complex Component with Sections
When to use: Component has repeater or flexible content fields where each item should be a separate section
Required Interfaces
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface; use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; use Vendi\Theme\DTO\Embedding\ComponentEmbedding; use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface;
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface { // Start with base embedding (auto-extracts heading/body from interfaces) $ret = ComponentEmbedding::fromComponent($this); // Loop through repeater field while (have_rows('items')) { the_row(); $layout = get_row_layout(); // CRITICAL: Filter to relevant layouts only if (!in_array($layout, ['content_item', 'text_block'], true)) { continue; } $heading = get_sub_field('heading'); $copy = get_sub_field('copy'); // CRITICAL: Always clean HTML from user content $cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy); // Add section with optional custom label $ret->addSection( $heading . PHP_EOL . $cleanCopy, 'Section' // Optional: 'FAQ Item', 'Testimonial', etc. ); } return $ret; }
Output
Heading: [component main heading] Body: [component intro copy] Section 1: Item 1 Heading [item 1 copy] Section 2: Item 2 Heading [item 2 copy]
Key Points:
- Filter layouts to process only relevant types
- Use
for all HTML contentstripAllHtmlFromText() - Each
creates a separate sub-chunkaddSection() - Sections are auto-numbered (Section 1, Section 2, etc.)
Pattern 4: Component with Links/CTAs
When to use: Component has call-to-action buttons or links that should be tracked in metadata
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface { $ret = ComponentEmbedding::fromComponent($this); while (have_rows('cards')) { the_row(); $heading = get_sub_field('heading'); $copy = get_sub_field('copy'); $link = get_sub_field('cta'); // Build structured content with labels $contentParts = []; if ($heading) { $contentParts[] = 'Heading: ' . $heading; } if ($copy) { $contentParts[] = 'Body: ' . $copy; } if ($link && is_array($link)) { $contentParts[] = 'Link: ' . $link['title']; } // Only add section if there's content if ($content = implode(PHP_EOL, array_filter($contentParts))) { $ret->addSection($content); } // CRITICAL: Track link separately in metadata if ($link && is_array($link)) { $ret->addLink( linkText: $link['title'] ?? '', linkUrl: $link['url'] ?? '' ); } } return $ret; }
Output
{ "content": "Heading: Component Title\nBody: Intro text\nLinks: Card 1 CTA, Card 2 CTA\nSection 1:\nHeading: Card 1\nBody: Card 1 copy\nLink: Card 1 CTA", "metadata": { "links": [ { "text": "Card 1 CTA", "url": "/page1/" }, { "text": "Card 2 CTA", "url": "/page2/" } ], "component_type": "card_navigation" } }
Key Points:
- Links appear in both content text and metadata
- Metadata links enable advanced RAG features
- Use structured content with labels (Heading:, Body:, Link:)
- Filter empty content before adding sections
Pattern 5: Component with HTML Content Containing Links
When to use: Component has HTML content (bios, articles, descriptions) with embedded
<a> tags that should be tracked
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface { $ret = ComponentEmbedding::fromComponent($this); while (have_rows('items')) { the_row(); $name = get_sub_field('name'); $bio = get_sub_field('bio'); // Contains HTML with links // CRITICAL: Extract links BEFORE stripping HTML // Use name as prefix for context ComponentEmbedding::extractAndAddLinksFromHtml($ret, $bio, $name); // Now strip HTML for text content $cleanBio = ComponentEmbedding::stripAllHtmlFromText($bio); $ret->addSection( 'Name: ' . $name . PHP_EOL . 'Bio: ' . $cleanBio, 'Person' ); } return $ret; }
Output
If bio contains:
<p>Follow me on <a href="https://twitter.com/jdoe">Twitter</a></p>
{ "content": "Person 1: Name: John Doe\nBio: Follow me on Twitter", "metadata": { "links": [ {"text": "John Doe Twitter", "url": "https://twitter.com/jdoe"} ] } }
Key Points:
- Call
BEFOREextractAndAddLinksFromHtml()stripAllHtmlFromText() - Use contextual prefix (name, title, etc.) to avoid duplicate generic link text
- Links preserved in metadata even after HTML is stripped from content
Pattern 6: Component with Related Posts
When to use: Component displays content from related WP_Post objects (testimonials, people, etc.)
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface { $ret = ComponentEmbedding::fromComponent($this); foreach ($this->getRelatedPosts() as $post) { // CRITICAL: Validate post object before accessing fields if (!$post instanceof WP_Post) { continue; } $name = get_field('name', $post->ID); $bio = get_field('bio', $post->ID); // Clean HTML and add with custom section label $ret->addSection( $name . PHP_EOL . ComponentEmbedding::stripAllHtmlFromText($bio), 'Person' // Custom label: 'Testimonial', 'Team Member', etc. ); } return $ret; }
Key Points:
- Always check
before accessing post fieldsinstanceof WP_Post - Access fields with post ID:
get_field('field_name', $post->ID) - Use descriptive section labels
Template File Requirements
CRITICAL: Every embeddable component template must include this boilerplate at the top.
Required Boilerplate
<?php use Vendi\Theme\Component\{component_name}; use Vendi\Theme\ComponentUtility; use Vendi\Theme\Enums\RenderingContextEnum; /** @var {component_name} $component */ $component = ComponentUtility::get_new_component_instance({component_name}::class); // CRITICAL: Early return for embedding context if (defined('VENDI_RENDER_CONTEXT') && VENDI_RENDER_CONTEXT === RenderingContextEnum::EMBEDDING->value) { return $component; } if (!$component->renderComponentWrapperStart()) { return; } ?> <!-- HTML template here --> <?php $component->renderComponentWrapperEnd();
Why This Matters
Without the embedding context check:
- Template will render HTML instead of returning component instance
method will never be calledgetEmbedding()- Component will be skipped in embedding output
This boilerplate is required even for skippable components (for consistency).
Key Methods & Utilities
ComponentEmbedding Static Factory
fromComponent($this)
fromComponent($this)Purpose: Create base embedding with auto-extraction
Auto-extracts:
- Component type (class short name)
- Post ID and URL
- Creation and modification dates
- Primary heading (if
implemented - based on template inspection)PrimaryHeadingInterface - Primary body copy (if
implemented - based on template inspection)PrimaryCopyInterface
Usage: Always first line of
getEmbedding()
public function getEmbedding(): ?ComponentEmbeddingInterface { $ret = ComponentEmbedding::fromComponent($this); // ... add sections, links, etc. return $ret; }
Note: The heading and body auto-extraction only works if you've implemented the corresponding interfaces based on what actually exists in the template (see Pattern 1 for details).
Content Building Methods
addSection(string $text, string $sectionLabel = 'Section')
addSection(string $text, string $sectionLabel = 'Section')Adds a labeled section to the embedding. Sections are auto-numbered (Section 1, Section 2, etc.).
Best Practice: Use descriptive labels
// Good: Descriptive $ret->addSection($content, 'Testimonial'); $ret->addSection($content, 'FAQ Item'); $ret->addSection($content, 'Team Member'); // Acceptable: Default auto-numbering $ret->addSection($content); // "Section 1", "Section 2", etc.
addLink(string $linkText, string $linkUrl)
addLink(string $linkText, string $linkUrl)Adds a link to metadata. Links stored separately from content text for advanced RAG features.
if ($link && is_array($link)) { $ret->addLink( linkText: $link['title'] ?? '', linkUrl: $link['url'] ?? '' ); }
extractAndAddLinksFromHtml(ComponentEmbedding $embedding, ?string $html, string $linkPrefix = '')
extractAndAddLinksFromHtml(ComponentEmbedding $embedding, ?string $html, string $linkPrefix = '')Purpose: Extracts all
<a> tags from HTML content and adds them to the embedding's link metadata.
When to use: When content contains HTML with embedded links that should be tracked separately (e.g., biographical text with social media links, articles with reference links).
Parameters:
- The ComponentEmbedding instance to add links to$embedding
- HTML content to parse for links$html
- Optional prefix to add context to link text (e.g., person name)$linkPrefix
Features:
- Uses DOMDocument for reliable HTML parsing
- Extracts both href and link text
- Filters out links missing href or text
- Adds contextual prefix when provided (useful for avoiding duplicate generic link text)
Usage:
// Basic usage - extract links from HTML ComponentEmbedding::extractAndAddLinksFromHtml($ret, $htmlContent); // With prefix for context (recommended when looping through items) foreach ($persons as $person) { $name = $person->name; $bio = $person->bio; // Contains <a href="...">Twitter</a>, <a href="...">LinkedIn</a> // Prefix links with person name: "John Doe Twitter", "John Doe LinkedIn" ComponentEmbedding::extractAndAddLinksFromHtml($ret, $bio, $name); // Clean HTML after extracting links $cleanBio = ComponentEmbedding::stripAllHtmlFromText($bio); $ret->addSection("Name: $name\nBio: $cleanBio", 'Person'); }
Why use linkPrefix: Without prefix, 20 people with Twitter links produces 20 identical "Twitter" entries. With prefix, you get "Chris Haas Twitter", "Jane Smith Twitter", etc., providing essential context.
Important: Call
extractAndAddLinksFromHtml() BEFORE stripAllHtmlFromText() to preserve the links before HTML is removed.
HTML Cleaning Utility
stripAllHtmlFromText(?string $text, bool $preserveLists = false)
stripAllHtmlFromText(?string $text, bool $preserveLists = false)CRITICAL: Always use this for user-entered HTML content
Features:
- Removes
,<script>
,<style>
tags and HTML comments<form> - Strips all remaining HTML tags
- Decodes HTML entities (
→&
)& - Collapses whitespace
- Optional: Preserves list structure with proper formatting
Usage:
// DO THIS: $cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy); $ret->addSection($cleanCopy); // NOT THIS: $ret->addSection($copy); // May contain <div>, <p>, <br> tags
Best Practices
1. Avoid Loading Sub-Components
VERY IMPORTANT: Write extraction code directly in
getEmbedding(). Do NOT load sub-components.
Strongly Preferred:
public function getEmbedding(): ?ComponentEmbeddingInterface { $ret = ComponentEmbedding::fromComponent($this); // Write code directly - NO sub-component loading while (have_rows('items')) { the_row(); $ret->addSection(get_sub_field('copy')); } return $ret; }
Avoid:
// DON'T load sub-components during embedding vendi_load_component_v3(['parent', 'child']);
Why: The system hasn't found a good pattern for sub-component loading in embeddings yet. Keep it simple and direct.
2. Always Clean HTML from User Content
// CORRECT: $cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy); $ret->addSection($cleanCopy); // WRONG: $ret->addSection($copy); // HTML tags leak into embedding
5. Use Structured Content with Labels
Makes content more parseable by the RAG system:
$contentParts = []; if ($heading) { $contentParts[] = 'Heading: ' . $heading; } if ($subheading) { $contentParts[] = 'Subheading: ' . $subheading; } if ($copy) { $contentParts[] = 'Body: ' . ComponentEmbedding::stripAllHtmlFromText($copy); } if ($link) { $contentParts[] = 'Link: ' . $link['title']; } $ret->addSection(implode(PHP_EOL, $contentParts));
Implementation Checklist
Step 1: Inspect Template File
- Read the component's template file (
) to understand its structure.php - Identify if there's a top-level heading (outside any loops) → Consider
PrimaryHeadingInterface - Identify if there's top-level body/intro copy (outside any loops) → Consider
PrimaryCopyInterface - Note any repeater fields that should become sections
- Note any links/CTAs that should be tracked in metadata
Step 2: Class File Changes
- Add use statements at top of file:
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface; use Vendi\Theme\DTO\Embedding\ComponentEmbedding; use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface; - Only if template has top-level heading: Add interface use statement:
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; - Only if template has top-level copy: Add interface use statement:
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; - Implement
in class declarationComponentEmbeddingAwareInterface - Only if template has top-level heading: Implement
PrimaryHeadingInterface - Only if template has top-level copy: Implement
PrimaryCopyInterface - Add
methodgetEmbedding(): ?ComponentEmbeddingInterface - If using
: AddPrimaryHeadingInterface
returning the appropriate fieldgetPrimaryHeadingText(): ?string - If using
: AddPrimaryCopyInterface
returning the appropriate fieldgetPrimaryCopy(): ?string
Step 3: Template File Changes
- Add use statement at top:
use Vendi\Theme\Enums\RenderingContextEnum; - Add embedding context check after component instantiation:
if (defined('VENDI_RENDER_CONTEXT') && VENDI_RENDER_CONTEXT === RenderingContextEnum::EMBEDDING->value) { return $component; }
Step 4: getEmbedding() Implementation
- Start with
$ret = ComponentEmbedding::fromComponent($this); - Loop through any repeater/flexible content fields
- Filter layouts to relevant types only (
check)in_array() - Use
for all HTML contentstripAllHtmlFromText() - Add sections with
for each logical chunkaddSection() - Add links with
if component has CTAsaddLink() - Validate WP_Post objects with
before accessing fieldsinstanceof - Filter empty content before adding sections
- Write code directly (do NOT load sub-components)
- Return
$ret
For Skippable Components Only
- Add use statement:
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingSkipAwareInterface; - Implement
in class declarationComponentEmbeddingSkipAwareInterface - Do NOT implement
ComponentEmbeddingAwareInterface - Still add template boilerplate (for consistency)
- No
method neededgetEmbedding()
Testing
After implementation, test with the CLI command:
wp vendi embedding:generate
This command:
- Iterates through all published posts/pages
- Sets
toVENDI_RENDER_CONTEXTEMBEDDING - Loads each component
- Calls
on embeddable componentsgetEmbedding() - Outputs structured JSON for Qdrant
Verify Output
Check the JSON output for:
- ✅ Component appears in embedding data
- ✅ Heading and body extracted correctly
- ✅ Sections appear as separate chunks (Section 1, Section 2, etc.)
- ✅ Links tracked in metadata
- ✅ HTML stripped from content (no
,<div>
,<p>
tags)<br> - ✅ Content is readable and well-structured
Sample Output Format
{ "content": "Heading: Research Topics\nSection 1: AANA's Current Priorities\nWhat are healthcare executives' perceptions...", "metadata": { "type": "page", "url": "https://example.com/page/", "created": "2022-11-29T21:01:08+00:00", "updated": "2024-03-07T09:07:06+00:00", "component_type": "accordion" }, "id": "660-2" }
Common Pitfalls
- Forgetting to clean HTML: Always use
on user contentstripAllHtmlFromText() - Loading sub-components: Write extraction code directly in
getEmbedding() - Missing template boilerplate: Component will render HTML instead of being embedded
- Not filtering layouts: Process only relevant flexible content layouts
- Not validating WP_Post: Check
before accessing post fieldsinstanceof WP_Post - Adding empty sections: Filter content before calling
addSection() - Forgetting to return component: Template must
in embedding contextreturn $component; - Extracting links after stripping HTML: Call
BEFOREextractAndAddLinksFromHtml()stripAllHtmlFromText() - Missing link context: Use linkPrefix parameter when looping through items to avoid duplicate generic link text
Reference Examples
Examine these components for real-world patterns:
- basic_copy_block - Simple: Single chunk with heading/body
- ad_row - Skippable: Marked with skip interface
- accordion - Complex: Multiple accordion items as sections
- card_navigation - Complex: Cards with CTAs tracked as links
- testimonial - Related Posts: WP_Post objects as sections with custom label
- people_image_grid - Complex: Loops through people, extracts links from bio HTML with name prefix, creates person sections
All located in:
vendi-theme-parts/components/[component_name]/[component_name].class.php
Your Role
Guide the user through implementing embedding support for a v3 component:
- Read the template file: Inspect the actual
template to understand structure.php - Identify top-level content: Determine if component has top-level heading and/or copy (outside loops)
- Determine pattern: Is it simple, complex, skippable? Does it have repeaters? Links?
- Choose interfaces: Based on template inspection, decide which interfaces to implement
- Present implementation plan: Describe changes needed with specific field names from template
- Implement changes: Update class and template files
- Test: Run
and verify outputwp vendi embedding:generate
Remember:
- Always start by reading the template file - don't guess at structure
- Implement
only if template has top-level heading (outside loops)PrimaryHeadingInterface - Implement
only if template has top-level copy (outside loops)PrimaryCopyInterface - User handles
andgit add
- you should NOT run thesegit commit - Write embedding extraction code directly (avoid sub-component loading)
- Always clean HTML from user content
- Use structured content with labels for better RAG performance