DDC_Skills_for_AI_Agents_in_Construction semantic-search-cwicr
Semantic search in DDC CWICR construction database using vector embeddings. Find similar work items and resources for cost estimation.
install
source · Clone the upstream repo
git clone https://github.com/datadrivenconstruction/DDC_Skills_for_AI_Agents_in_Construction
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/datadrivenconstruction/DDC_Skills_for_AI_Agents_in_Construction "$T" && mkdir -p ~/.claude/skills && cp -r "$T/1_DDC_Toolkit/CWICR-Database/semantic-search-cwicr" ~/.claude/skills/datadrivenconstruction-ddc-skills-for-ai-agents-in-construction-semantic-search- && rm -rf "$T"
manifest:
1_DDC_Toolkit/CWICR-Database/semantic-search-cwicr/SKILL.mdsource content
Semantic Search in DDC CWICR Database
Business Case
Problem Statement
Construction cost estimation requires finding relevant work items from large databases. Traditional keyword search fails when:
- Users describe work in natural language
- Terminology varies across regions and languages
- Similar work items have different naming conventions
Solution
DDC CWICR database provides pre-computed embeddings (OpenAI text-embedding-3-large, 3072 dimensions) enabling semantic similarity search across 55,719 work items in 9 languages.
Business Value
- 90% faster work item lookup compared to manual search
- Multi-language support: Arabic, Chinese, German, English, Spanish, French, Hindi, Portuguese, Russian
- Higher accuracy by finding semantically similar items, not just keyword matches
Technical Implementation
Prerequisites
pip install qdrant-client openai pandas
Database Setup
# Download Qdrant snapshot wget https://github.com/datadrivenconstruction/OpenConstructionEstimate-DDC-CWICR/releases/download/v0.1.0/qdrant_snapshot_en.tar.gz # Start Qdrant with Docker docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
Python Implementation
import pandas as pd from qdrant_client import QdrantClient from qdrant_client.models import Distance, VectorParams import openai class CWICRSemanticSearch: def __init__(self, qdrant_host: str = "localhost", port: int = 6333): self.client = QdrantClient(host=qdrant_host, port=port) self.collection_name = "ddc_cwicr_en" self.embedding_model = "text-embedding-3-large" self.embedding_dim = 3072 def get_embedding(self, text: str) -> list: """Generate embedding for search query.""" response = openai.embeddings.create( model=self.embedding_model, input=text ) return response.data[0].embedding def search_work_items(self, query: str, limit: int = 10, min_score: float = 0.7) -> pd.DataFrame: """Search for similar work items.""" query_vector = self.get_embedding(query) results = self.client.search( collection_name=self.collection_name, query_vector=query_vector, limit=limit, score_threshold=min_score ) items = [] for result in results: item = result.payload item['similarity_score'] = result.score items.append(item) return pd.DataFrame(items) def search_by_category(self, query: str, category: str, limit: int = 10) -> pd.DataFrame: """Search within specific category.""" query_vector = self.get_embedding(query) results = self.client.search( collection_name=self.collection_name, query_vector=query_vector, query_filter={ "must": [{"key": "category", "match": {"value": category}}] }, limit=limit ) return pd.DataFrame([{**r.payload, 'score': r.score} for r in results]) def estimate_cost(self, work_items: pd.DataFrame, quantities: dict) -> dict: """Calculate cost from matched work items.""" total_cost = 0 breakdown = [] for _, item in work_items.iterrows(): if item['work_item_code'] in quantities: qty = quantities[item['work_item_code']] cost = qty * item.get('unit_price', 0) total_cost += cost breakdown.append({ 'item': item['description'], 'quantity': qty, 'unit_price': item.get('unit_price', 0), 'total': cost }) return { 'total_cost': total_cost, 'breakdown': breakdown, 'currency': 'Regional default' }
Usage Examples
Basic Search
search = CWICRSemanticSearch() # Natural language query results = search.search_work_items("brick masonry wall construction") print(results[['description', 'unit', 'unit_price', 'similarity_score']])
Cost Estimation
# Find work items for foundation work foundation_items = search.search_work_items( "reinforced concrete foundation excavation and pouring", limit=20 ) # Estimate with quantities quantities = { 'CONC-001': 150, # cubic meters 'EXCV-002': 200, # cubic meters } estimate = search.estimate_cost(foundation_items, quantities) print(f"Estimated Cost: ${estimate['total_cost']:,.2f}")
Database Schema
| Field | Type | Description |
|---|---|---|
| work_item_code | string | Unique identifier |
| description | string | Work item description |
| unit | string | Measurement unit |
| labor_norm | float | Labor hours per unit |
| material_cost | float | Material cost per unit |
| equipment_cost | float | Equipment cost per unit |
| unit_price | float | Total price per unit |
| category | string | Work category |
| embedding | vector[3072] | Pre-computed embedding |
Best Practices
- Use specific queries - "reinforced concrete slab 200mm" beats "concrete"
- Filter by category - Narrow results to relevant work types
- Check similarity scores - Scores below 0.7 may need manual verification
- Combine with QTO - Use BIM quantities for automated estimation
Resources
- GitHub: OpenConstructionEstimate-DDC-CWICR
- Releases: v0.1.0 Database Downloads
- Qdrant Docs: https://qdrant.tech/documentation/