Awesome-omni-skill project-knowledge
CEI architecture, modules, data flows, conventions, tech stack decisions
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/project-knowledge" ~/.claude/skills/diegosouzapw-awesome-omni-skill-project-knowledge && rm -rf "$T"
manifest:
skills/devops/project-knowledge/SKILL.mdsource content
Project Knowledge — CEI-001
Project Overview
Name: CEI-001 — Guide Interactif Pré-Projet ERP
Purpose: Evaluate ERP implementation readiness for small manufacturing enterprises
Users: SME manufacturers, CEI consultants, admins
Timeline: 50 hours forfait
Budget: Free access for users, admin requires auth
Architecture Decisions
| Decision | Rationale |
|---|---|
| Chat + Evaluation hybrid | Chat for exploration, Evaluation for structured assessment |
| OpenAI GPT-4 | Quality > cost for strategic consulting |
| Weaviate RAG | Open source, semantic search, admin-friendly |
| PostgreSQL | Relational, JSON support, proven reliability |
| FastAPI | Async native, auto-docs, type safety |
| React + TypeScript | Type safety, ecosystem maturity |
| JWT auth | Stateless, simple for admin-only protection |
| Docker Compose | Easy deployment, local development |
Core Modules (8)
- Vision & Objectives — Why ERP? Strategic alignment
- Organizational Prep — Stakeholders, roles, change management
- Data & Processes — Inventory, quality, documentation
- Technical Infrastructure — Current setup, connectivity needs
- Resources & Budget — Costs, availability, timeline
- Pitfalls to Avoid — Common failures, risks
- Implementation Process — Phases, deliverables, success criteria
- Post-Implementation — Training, support, optimization
Data Flows
Chat Flow
User input → Frontend → POST /api/chat/message → Save message (PostgreSQL) → Query Weaviate (semantic search) → Build RAG context → Call OpenAI API (with context) → Stream response back → Save assistant message → Frontend displays with sources
Evaluation Flow
User starts evaluation → Load questions (8 modules) → User answers module by module → Answers saved to PostgreSQL → On completion: → Scoring engine calculates scores → Generate recommendations → Create report → Return PDF
Admin Document Flow
Admin uploads document → Upload to server → Save metadata (PostgreSQL) → Start pipeline: → Anonymize (OpenAI) → Whitelabel (OpenAI) → Normalize (OpenAI) → Enrich with summary (OpenAI) → Generate Q&A (OpenAI) → Chunk for RAG → Index into Weaviate → Publish
Key Entities
- User: Email-based, role-based access
- Conversation: Chat history, multi-turn context
- Message: User/assistant messages with sources
- Evaluation: User's assessment session
- Answer: User's response to each question
- Question: Pre-defined evaluation questions (8 modules)
- Document: Knowledge base documents
- DocumentChunk: Indexed document sections (Weaviate)
Naming Conventions
- Routes:
/api/[resource]/[action] - Tables:
lowercase_plural - Columns:
snake_case - Models:
PascalCase - Functions:
(Python:camelCase
)snake_case - Components:
PascalCase.tsx - Hooks:
useXxx
Configuration
# Core DEBUG = False ENVIRONMENT = "production" # Database DATABASE_URL = "postgresql+asyncpg://user:pass@localhost:5432/cei" # Weaviate WEAVIATE_HOST = "weaviate:8080" WEAVIATE_SCHEME = "http" # OpenAI OPENAI_API_KEY = "sk-..." OPENAI_MODEL = "gpt-4-turbo-preview" OPENAI_EMBEDDING_MODEL = "text-embedding-3-small" # Auth JWT_SECRET = "your-secret-key-32-chars-min" JWT_EXPIRE_HOURS = 24 # Frontend VITE_API_URL = "https://api.yourdomain.com"
Tech Stack Summary
| Layer | Technology | Why |
|---|---|---|
| Frontend | React 18 + TS | Type safety, ecosystem |
| Styling | TailwindCSS 3 | Rapid, consistent UI |
| Build | Vite 5 | Fast HMR, modern |
| Backend | FastAPI 0.109 | Async, auto-docs |
| Database | PostgreSQL 16 | Relational, JSON |
| ORM | SQLAlchemy 2.0 | Async support, mature |
| Vector DB | Weaviate 1.24 | Open source, semantic |
| LLM | OpenAI API | Quality responses |
| Auth | JWT + bcrypt | Standard, simple |
| Container | Docker Compose | Multi-service |
name: rag-weaviate description: Document indexing, semantic search, RAG pipelines, chunking, Weaviate integration
RAG & Weaviate — CEI-001
Weaviate Schema
# app/services/rag_service.py from weaviate import Client import weaviate.classes as wvc class RAGService: def __init__(self, weaviate_url: str): self.client = Client(f"http://{weaviate_url}") self._ensure_schema() def _ensure_schema(self): """Create Weaviate schema if not exists""" # Document class for indexed documents self.client.collections.create( name="Document", description="CEI knowledge base documents", vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(), properties=[ wvc.Property( name="title", data_type=wvc.DataType.TEXT, description="Document title" ), wvc.Property( name="content", data_type=wvc.DataType.TEXT, description="Document chunk content" ), wvc.Property( name="section", data_type=wvc.DataType.TEXT, description="Section title" ), wvc.Property( name="module", data_type=wvc.DataType.TEXT, description="Evaluation module (vision, org, data, etc.)" ), wvc.Property( name="document_id", data_type=wvc.DataType.UUID, description="PostgreSQL document ID" ), wvc.Property( name="chunk_index", data_type=wvc.DataType.INT, description="Chunk position in document" ), ] )
Indexing Pipeline
async def index_document(self, doc_id: str, chunks: List[str]): """Index document chunks into Weaviate""" collection = self.client.collections.get("Document") # Prepare objects objects = [] for idx, chunk in enumerate(chunks): obj = wvc.DataObject( properties={ "title": f"Document {doc_id}", "content": chunk, "section": "unknown", "module": "general", "document_id": doc_id, "chunk_index": idx, } ) objects.append(obj) # Batch import uuids = collection.data.insert_multiple(objects) return uuids async def search(self, query: str, limit: int = 3): """Semantic search in Weaviate""" collection = self.client.collections.get("Document") results = collection.query.near_text( query=query, limit=limit, where_filter=wvc.Filter.by_property("module").not_equal("archived") ).objects return [ { "title": obj.properties["title"], "content": obj.properties["content"], "section": obj.properties["section"], "module": obj.properties["module"], "score": obj.metadata.score } for obj in results ] async def reindex_document(self, doc_id: str): """Remove old chunks and reindex""" collection = self.client.collections.get("Document") # Delete old chunks collection.data.delete_many( where=wvc.Filter.by_property("document_id").equal(doc_id) )
Chunking Strategy
def chunk_text( content: str, chunk_size: int = 800, chunk_overlap: int = 100 ) -> List[str]: """Smart chunking: split by paragraphs, then sentences""" chunks = [] paragraphs = content.split('\n\n') current_chunk = "" for para in paragraphs: if len(current_chunk) + len(para) < chunk_size: current_chunk += para + "\n\n" else: if current_chunk: chunks.append(current_chunk.strip()) # Handle overlap if len(para) > chunk_overlap: current_chunk = para else: current_chunk = para if current_chunk: chunks.append(current_chunk.strip()) return chunks
RAG Response Generation
async def generate_rag_response( self, user_query: str, chat_history: List[Dict], openai_client: AsyncOpenAI ) -> Tuple[str, List[Dict]]: """Generate response with RAG context""" # 1. Search knowledge base context_docs = await self.search(user_query, limit=3) # 2. Build context context_text = "\n\n".join([ f"Source: {doc['title']}\n{doc['content']}" for doc in context_docs ]) # 3. Build prompt system_prompt = f"""Tu es un expert ERP pour PME manufacturières. Contexte de connaissances: {context_text} Réponds en utilisant ce contexte. Cite les sources quand pertinent. Sois concis et pratique.""" # 4. Call OpenAI response = await openai_client.messages.create( model="gpt-4-turbo-preview", max_tokens=1024, system=system_prompt, messages=chat_history ) return response.content[0].text, context_docs
Embedding Configuration
# app/config.py OPENAI_EMBEDDING_MODEL = "text-embedding-3-small" OPENAI_EMBEDDING_DIMENSION = 1536 # Cost optimization: use smaller embeddings # text-embedding-3-small: 1536 dimensions, cheap # text-embedding-3-large: 3072 dimensions, more precise
Similarity Threshold
# Search with confidence threshold async def search_with_confidence(self, query: str, min_score: float = 0.5): """Only return results above confidence threshold""" results = await self.search(query, limit=5) return [ r for r in results if r["score"] >= min_score ]
Conventions
- Chunk size: 800 tokens (good for context windows)
- Chunk overlap: 100 tokens (preserve context)
- Min similarity: 0.5 (high confidence)
- Update frequency: on document publish
- Archive old versions (don't delete)