Skillshub mvp

Goal: Build an LLM-based RAG App

install

source · Clone the upstream repo

git clone https://github.com/ComeOnOliver/skillshub

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/Harmeet10000/skills/mvp" ~/.claude/skills/comeonoliver-skillshub-mvp && rm -rf "$T"

manifest: skills/Harmeet10000/skills/mvp/SKILL.md

source content

Goal: Build an LLM-based RAG App Here is the MVP Implementation Plan.

MVP Scope A "second brain" application that leverages Retrieval Augmented Generation (RAG) to integrate external knowledge and data with LLM capabilities. This MVP will allow users to upload their files as a knowledge base and perform natural language queries against that data using LM Studio.

Technical Stack Frontend: Streamlit Backend: FastAPI LLM Integration: LM Studio

Core Features

File Upload Users can upload documents (PDFs, text files, etc.) into the system. Extract key information from the uploaded files and store them in a vector database or simple on-disk index.
Basic Search Natural language search, powered by LM Studio. In-context learning approach with few-shot learning to enhance responses. RAG flow: Retrieve relevant segments from the knowledge base. Combine retrieved segments with user query. Generate a context-enriched answer.

Data Structure Document Schema class Document: id: str # Unique identifier title: str # File title or name content: str # Extracted text metadata: dict # Any additional metadata (e.g., upload date, tags)

Index/Vector Embeddings Depending on the chosen approach (vector DB vs. local embeddings): Vector Database: Each Document (or chunk of a document) will have an embedding vector. Local Index: Store embeddings in a local file or simple database.

Implementation Plan

Backend Setup with FastAPI File Upload Endpoint

Accept file uploads (PDF, TXT, etc.). Extract and preprocess text (OCR if necessary). Split text into chunks for indexing. Generate embeddings via LM Studio or a compatible embedding model. Store the embeddings and metadata in a vector store local index. Search Endpoint

Accept natural language queries. Use embeddings to find the most relevant chunks from the knowledge base. Construct context from these chunks. Send context + query to LM Studio for final RAG-based answer. Data Storage

For MVP, consider a simple JSON or SQLite-based storage for metadata and a local or cloud-based vector index. Add basic error handling (invalid file format, missing data, etc.). CORS Setup

Configure CORS in FastAPI to allow calls from the Streamlit frontend.

Frontend Development (Streamlit) File Upload Interface

Simple file upload widget. On submission, call the FastAPI file upload endpoint. Provide feedback on success/failure. Search Interface

Text input for queries. On submission, call the FastAPI search endpoint. Display the AI-generated response and (optionally) the top retrieved chunks. UX & Basic Layout

Show a list of previously uploaded files (titles, metadata, etc.). Simple and intuitive design with minimal clutter. Provide error messages or success confirmations. Error Handling & User Feedback

Show relevant messages for file upload errors, incomplete data, etc. Display a loading indicator while the system is querying or processing.

API Endpoints POST /api/files/upload

Description: Upload and store documents. Payload: File(s) + optional metadata. Response: Success message with document ID(s). GET /api/files

Description: List all uploaded documents. Response: A JSON array of documents with basic metadata. GET /api/files/{id}

Description: Fetch the details of a specific document by ID. Response: Document details (title, content if needed, metadata). POST /api/search

Description: Search the knowledge base with a natural language query. Payload: { "query": "Your question here" } Response: RAG-based answer and possibly the top chunks or references.

Future Enhancements (Beyond MVP) Authentication & Authorization for secure file sharing. Advanced File Parsing (OCR for scanned PDFs, docx parsing). Versioning & Revision History for uploaded documents. UI Improvements such as advanced filtering, summarization, or highlight features.