# Story 4.6: RAG-Powered Video Chat with ChromaDB ## Story Overview **Story ID**: 4.6 **Epic**: 4 - Advanced Intelligence & Developer Platform **Title**: RAG-Powered Video Chat with ChromaDB **Status**: 📋 READY FOR IMPLEMENTATION **Priority**: Medium **Goal**: Implement a RAG (Retrieval Augmented Generation) chatbot interface using ChromaDB for semantic search, enabling users to have interactive Q&A conversations with video content using precise timestamp source references. **Value Proposition**: Transform passive video consumption into interactive content exploration, allowing users to ask specific questions about video content and receive precise answers with exact timestamp references for verification. **Dependencies**: - ✅ Story 4.4 (Custom AI Models) for AI service infrastructure - ✅ Existing ChromaDB integration patterns from `/tests/framework-comparison/test_langgraph_chromadb.py` - ✅ Transcript extraction system **Estimated Effort**: 20 hours ## Technical Requirements ### Core Features #### 1. ChromaDB Vector Database - **Semantic Transcript Chunking**: Split transcripts into meaningful chunks with overlap - **Embedding Storage**: Generate and store embeddings for all transcript segments - **Metadata Preservation**: Maintain timestamp, video ID, and section information - **Vector Search**: Semantic similarity search across transcript content - **Collection Management**: Organize embeddings by video, user, or topic #### 2. RAG Implementation - **Context Retrieval**: Fetch relevant transcript chunks based on user questions - **Retrieved Chunks**: Use existing test patterns from `/tests/framework-comparison/` - **Context Window**: Optimize context size for AI model limits - **Relevance Scoring**: Rank retrieved chunks by semantic relevance - **Source Attribution**: Maintain clear connection between chunks and timestamps #### 3. Chat Interface - **Real-time Q&A**: Interactive chat interface for video-specific questions - **Timestamp References**: Every response includes source timestamps like `[00:05:23]` - **DeepSeek Integration**: AI responses using DeepSeek models (no Anthropic per user requirements) - **Context Awareness**: Maintain conversation context and follow-up questions - **Visual Design**: Clean chat interface integrated with video summary page #### 4. Enhanced Features - **Follow-up Suggestions**: AI-generated follow-up questions based on content - **Conversation History**: Persistent chat sessions linked to video summaries - **Export Conversations**: Save Q&A sessions as part of video documentation - **Multi-Video Chat**: Ask questions across multiple videos in a playlist ### Technical Architecture #### RAG System Components ```python class RAGService: def __init__(self): self.vector_db = ChromaVectorDB() self.embeddings = HuggingFaceEmbeddings() # Local embeddings self.ai_service = DeepSeekService() self.chunk_processor = TranscriptChunker() async def process_video_for_rag(self, video_id: str, transcript: str) -> bool: # Chunk transcript into semantic segments # Generate embeddings for each chunk # Store in ChromaDB with metadata # Return success status async def ask_question(self, video_id: str, question: str, chat_history: List[ChatMessage]) -> ChatResponse: # Retrieve relevant chunks using semantic search # Build context from retrieved chunks # Generate response with DeepSeek # Format response with timestamp references ``` #### Database Schema Extensions ```sql -- Chat sessions for persistent conversations CREATE TABLE chat_sessions ( id UUID PRIMARY KEY, user_id UUID REFERENCES users(id), video_id VARCHAR(20), summary_id UUID REFERENCES summaries(id), session_name VARCHAR(200), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, total_messages INTEGER DEFAULT 0, is_active BOOLEAN DEFAULT TRUE ); -- Individual chat messages CREATE TABLE chat_messages ( id UUID PRIMARY KEY, session_id UUID REFERENCES chat_sessions(id), message_type VARCHAR(20), -- 'user', 'assistant', 'system' content TEXT, sources JSONB, -- Array of {chunk_id, timestamp, relevance_score} processing_time_seconds FLOAT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Vector embeddings for RAG (ChromaDB metadata reference) CREATE TABLE video_chunks ( id UUID PRIMARY KEY, video_id VARCHAR(20), chunk_index INTEGER, chunk_text TEXT, start_timestamp INTEGER, -- seconds end_timestamp INTEGER, word_count INTEGER, embedding_id VARCHAR(100), -- ChromaDB document ID created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- RAG performance tracking CREATE TABLE rag_analytics ( id UUID PRIMARY KEY, video_id VARCHAR(20), question TEXT, retrieval_count INTEGER, relevance_scores JSONB, response_quality_score FLOAT, user_feedback INTEGER, -- 1-5 rating processing_time_seconds FLOAT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` ## Implementation Tasks ### Task 4.6.1: ChromaDB Vector Database Setup (6 hours) #### Subtasks: 1. **ChromaDB Configuration** (2 hours) - Set up ChromaDB client with persistent storage - Configure collections for video transcripts - Implement collection naming and organization strategy - Add cleanup and maintenance procedures - Test database initialization and connection 2. **Transcript Chunking Service** (2 hours) - Create intelligent transcript segmentation algorithm - Implement overlapping chunks for context preservation - Extract meaningful chunk boundaries (sentence/paragraph breaks) - Preserve timestamp information in chunks - Handle various transcript formats and quality levels 3. **Embedding Generation and Storage** (2 hours) - Integrate HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2) - Generate embeddings for transcript chunks - Store embeddings with metadata in ChromaDB - Implement batch processing for large transcripts - Add progress tracking for embedding generation ### Task 4.6.2: RAG Retrieval System (8 hours) #### Subtasks: 1. **Semantic Search Implementation** (3 hours) - Implement similarity search across video chunks - Add relevance scoring and ranking algorithms - Configure search parameters (number of results, similarity threshold) - Handle edge cases (no relevant chunks, low similarity scores) - Test search quality with various question types 2. **Context Building Service** (2 hours) - Aggregate retrieved chunks into coherent context - Implement context window management for AI models - Preserve chunk ordering and timestamp information - Add context summarization for long retrievals - Handle overlapping chunks and deduplication 3. **Source Attribution System** (2 hours) - Link retrieved chunks to specific timestamps - Generate clickable timestamp references `[00:05:23]` - Create YouTube deep links for timestamp navigation - Implement source verification and quality checks - Add confidence scoring for source attribution 4. **RAG Response Generation** (1 hour) - Integrate DeepSeek AI service for response generation - Create RAG-specific prompts with context and question - Format responses with proper source citations - Handle cases where no relevant context is found - Add response quality validation ### Task 4.6.3: Chat Interface Implementation (4 hours) #### Subtasks: 1. **Chat Frontend Component** (2 hours) - Create interactive chat interface with message history - Implement typing indicators and loading states - Add timestamp link rendering and click handling - Design responsive chat layout for video summary pages - Add keyboard shortcuts and accessibility features 2. **Chat Session Management** (1 hour) - Implement persistent chat sessions linked to videos - Add session creation, saving, and loading - Create chat session list and management interface - Handle session state and conversation context - Add session export and sharing functionality 3. **Follow-up Question System** (1 hour) - Generate AI-powered follow-up question suggestions - Base suggestions on video content and conversation context - Display suggested questions as clickable options - Track suggestion effectiveness and user engagement - Add customizable suggestion preferences ### Task 4.6.4: API Integration and Enhancement (2 hours) #### Subtasks: 1. **RAG API Endpoints** (1 hour) - `POST /api/rag/chat/{video_id}` - Ask question about specific video - `GET /api/rag/sessions/{user_id}` - Get user's chat sessions - `POST /api/rag/sessions/{session_id}/export` - Export conversation - `GET /api/rag/suggestions/{video_id}` - Get follow-up suggestions - Add comprehensive error handling and validation 2. **Performance Optimization** (1 hour) - Implement caching for frequent questions and responses - Add batch processing for multiple questions - Optimize ChromaDB queries and connection management - Add response streaming for long AI responses - Monitor and optimize response times and resource usage ## Data Models ### RAG Chat Models ```python from pydantic import BaseModel from typing import List, Dict, Optional, Any from datetime import datetime from enum import Enum class MessageType(str, Enum): USER = "user" ASSISTANT = "assistant" SYSTEM = "system" class SourceReference(BaseModel): chunk_id: str timestamp: int # seconds timestamp_formatted: str # [HH:MM:SS] youtube_link: str chunk_text: str relevance_score: float class ChatMessage(BaseModel): id: str message_type: MessageType content: str sources: List[SourceReference] processing_time_seconds: float created_at: datetime class ChatSession(BaseModel): id: str user_id: str video_id: str summary_id: str session_name: str messages: List[ChatMessage] total_messages: int is_active: bool created_at: datetime updated_at: datetime class ChatRequest(BaseModel): video_id: str question: str session_id: Optional[str] = None include_context: bool = True max_sources: int = 5 class ChatResponse(BaseModel): session_id: str message: ChatMessage follow_up_suggestions: List[str] context_retrieved: bool total_chunks_searched: int class RAGAnalytics(BaseModel): question: str retrieval_count: int relevance_scores: List[float] response_quality_score: float processing_time_seconds: float user_feedback: Optional[int] = None ``` ## Testing Strategy ### Unit Tests - **ChromaDB Integration**: Connection, storage, and retrieval operations - **Transcript Chunking**: Segmentation quality and metadata preservation - **Embedding Generation**: Vector quality and consistency - **Semantic Search**: Relevance and ranking accuracy - **Source Attribution**: Timestamp accuracy and link generation ### Integration Tests - **RAG Pipeline**: End-to-end question answering workflow - **Chat API**: All chat and session management endpoints - **Frontend Integration**: Chat interface functionality and state management - **Database Operations**: Session and message persistence ### Quality Assurance Tests - **Answer Relevance**: Semantic accuracy of responses to questions - **Source Attribution**: Timestamp precision and link functionality - **Response Quality**: Coherence and helpfulness of AI responses - **Performance**: Response time and resource usage under load ## API Specification ### RAG Chat Endpoints ```yaml /api/rag/chat/{video_id}: post: summary: Ask question about video content using RAG parameters: - name: video_id in: path required: true schema: type: string requestBody: required: true content: application/json: schema: $ref: '#/components/schemas/ChatRequest' responses: 200: content: application/json: schema: $ref: '#/components/schemas/ChatResponse' /api/rag/sessions/{user_id}: get: summary: Get user's chat sessions parameters: - name: user_id in: path required: true schema: type: string - name: active_only in: query schema: type: boolean default: true responses: 200: content: application/json: schema: type: array items: $ref: '#/components/schemas/ChatSession' /api/rag/embeddings/{video_id}/generate: post: summary: Generate embeddings for video transcript parameters: - name: video_id in: path required: true schema: type: string responses: 202: content: application/json: schema: type: object properties: job_id: type: string status: type: string estimated_completion: type: string format: date-time ``` ## Success Criteria ### Functional Requirements ✅ - [ ] ChromaDB stores transcript embeddings with timestamp metadata - [ ] Semantic search retrieves relevant content chunks for user questions - [ ] Chat interface provides real-time Q&A with timestamp source references - [ ] DeepSeek AI generates contextual responses using retrieved chunks - [ ] Follow-up question suggestions based on video content - [ ] Persistent chat sessions linked to specific videos ### Quality Requirements ✅ - [ ] Answer relevance >85% for factual questions about video content - [ ] Timestamp references accurate within 10-second tolerance - [ ] Source attribution clearly links responses to specific video segments - [ ] Response quality maintains conversation context across messages - [ ] Follow-up suggestions are relevant and engaging - [ ] Chat interface provides smooth user experience with loading states ### Performance Requirements ✅ - [ ] Question answering response time under 8 seconds - [ ] ChromaDB search completes in under 2 seconds - [ ] Embedding generation processes 1-hour video in under 5 minutes - [ ] Chat interface supports concurrent conversations without degradation - [ ] Memory usage remains stable during long conversation sessions ## Implementation Notes ### ChromaDB Integration - Use existing patterns from `/tests/framework-comparison/test_langgraph_chromadb.py` - Implement HuggingFace embeddings for local processing (no API dependencies) - Configure persistent storage in `./data/chromadb_rag/` directory - Use collection per video or organized by user/topic as needed ### Transcript Chunking Strategy - Create semantic chunks of 200-400 words with 50-word overlap - Preserve sentence boundaries and paragraph structure - Maintain timestamp ranges for each chunk - Include video context (title, channel) in chunk metadata ### RAG Response Pattern - Retrieve 3-5 most relevant chunks for context - Include source timestamps in response format: "According to the video at [00:05:23], ..." - Provide YouTube deep links for timestamp navigation - Handle cases where no relevant content is found gracefully ### DeepSeek Integration - Use DeepSeek API for response generation (per user requirement: no Anthropic) - Configure appropriate model parameters for conversational responses - Implement cost tracking and usage monitoring - Add response quality scoring and feedback collection ## Risk Mitigation ### High Risk: Answer Quality and Relevance - **Risk**: RAG responses may be generic or miss important context - **Mitigation**: Quality scoring, user feedback collection, continuous prompt optimization ### Medium Risk: Timestamp Accuracy - **Risk**: Source timestamps may not accurately reflect quoted content - **Mitigation**: Chunk boundary validation, timestamp verification, user correction system ### Medium Risk: Performance with Large Videos - **Risk**: Long videos may cause slow embedding generation and search - **Mitigation**: Batch processing, progress tracking, optimized chunking strategies --- **Story Owner**: Development Team **Architecture Reference**: BMad Method Epic-Story Structure **Implementation Status**: Ready for Development **Last Updated**: 2025-08-27