youtube-summarizer/docs/stories/4.6.rag-powered-video-chat.md

# Story 4.6: RAG-Powered Video Chat with ChromaDB

## Story Overview

**Story ID**: 4.6
**Epic**: 4 - Advanced Intelligence & Developer Platform
**Title**: RAG-Powered Video Chat with ChromaDB
**Status**: 📋 READY FOR IMPLEMENTATION
**Priority**: Medium

**Goal**: Implement a RAG (Retrieval Augmented Generation) chatbot interface using ChromaDB for semantic search, enabling users to have interactive Q&A conversations with video content using precise timestamp source references.

**Value Proposition**: Transform passive video consumption into interactive content exploration, allowing users to ask specific questions about video content and receive precise answers with exact timestamp references for verification.

**Dependencies**:
- ✅ Story 4.4 (Custom AI Models) for AI service infrastructure
- ✅ Existing ChromaDB integration patterns from `/tests/framework-comparison/test_langgraph_chromadb.py`
- ✅ Transcript extraction system

**Estimated Effort**: 20 hours

## Technical Requirements

### Core Features

#### 1. ChromaDB Vector Database
- **Semantic Transcript Chunking**: Split transcripts into meaningful chunks with overlap
- **Embedding Storage**: Generate and store embeddings for all transcript segments
- **Metadata Preservation**: Maintain timestamp, video ID, and section information
- **Vector Search**: Semantic similarity search across transcript content
- **Collection Management**: Organize embeddings by video, user, or topic

#### 2. RAG Implementation
- **Context Retrieval**: Fetch relevant transcript chunks based on user questions
- **Retrieved Chunks**: Use existing test patterns from `/tests/framework-comparison/`
- **Context Window**: Optimize context size for AI model limits
- **Relevance Scoring**: Rank retrieved chunks by semantic relevance
- **Source Attribution**: Maintain clear connection between chunks and timestamps

#### 3. Chat Interface
- **Real-time Q&A**: Interactive chat interface for video-specific questions
- **Timestamp References**: Every response includes source timestamps like `[00:05:23]`
- **DeepSeek Integration**: AI responses using DeepSeek models (no Anthropic per user requirements)
- **Context Awareness**: Maintain conversation context and follow-up questions
- **Visual Design**: Clean chat interface integrated with video summary page

#### 4. Enhanced Features
- **Follow-up Suggestions**: AI-generated follow-up questions based on content
- **Conversation History**: Persistent chat sessions linked to video summaries
- **Export Conversations**: Save Q&A sessions as part of video documentation
- **Multi-Video Chat**: Ask questions across multiple videos in a playlist

### Technical Architecture

#### RAG System Components

```python
class RAGService:
    def __init__(self):
        self.vector_db = ChromaVectorDB()
        self.embeddings = HuggingFaceEmbeddings()  # Local embeddings
        self.ai_service = DeepSeekService()
        self.chunk_processor = TranscriptChunker()

    async def process_video_for_rag(self, video_id: str, transcript: str) -> bool:
        # Chunk transcript into semantic segments
        # Generate embeddings for each chunk
        # Store in ChromaDB with metadata
        # Return success status

    async def ask_question(self, video_id: str, question: str, chat_history: List[ChatMessage]) -> ChatResponse:
        # Retrieve relevant chunks using semantic search
        # Build context from retrieved chunks
        # Generate response with DeepSeek
        # Format response with timestamp references
```

#### Database Schema Extensions

```sql
-- Chat sessions for persistent conversations
CREATE TABLE chat_sessions (
    id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(id),
    video_id VARCHAR(20),
    summary_id UUID REFERENCES summaries(id),
    session_name VARCHAR(200),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_messages INTEGER DEFAULT 0,
    is_active BOOLEAN DEFAULT TRUE
);

-- Individual chat messages
CREATE TABLE chat_messages (
    id UUID PRIMARY KEY,
    session_id UUID REFERENCES chat_sessions(id),
    message_type VARCHAR(20), -- 'user', 'assistant', 'system'
    content TEXT,
    sources JSONB, -- Array of {chunk_id, timestamp, relevance_score}
    processing_time_seconds FLOAT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Vector embeddings for RAG (ChromaDB metadata reference)
CREATE TABLE video_chunks (
    id UUID PRIMARY KEY,
    video_id VARCHAR(20),
    chunk_index INTEGER,
    chunk_text TEXT,
    start_timestamp INTEGER, -- seconds
    end_timestamp INTEGER,
    word_count INTEGER,
    embedding_id VARCHAR(100), -- ChromaDB document ID
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- RAG performance tracking
CREATE TABLE rag_analytics (
    id UUID PRIMARY KEY,
    video_id VARCHAR(20),
    question TEXT,
    retrieval_count INTEGER,
    relevance_scores JSONB,
    response_quality_score FLOAT,
    user_feedback INTEGER, -- 1-5 rating
    processing_time_seconds FLOAT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

## Implementation Tasks

### Task 4.6.1: ChromaDB Vector Database Setup (6 hours)

#### Subtasks:
1. **ChromaDB Configuration** (2 hours)
   - Set up ChromaDB client with persistent storage
   - Configure collections for video transcripts
   - Implement collection naming and organization strategy
   - Add cleanup and maintenance procedures
   - Test database initialization and connection

2. **Transcript Chunking Service** (2 hours)
   - Create intelligent transcript segmentation algorithm
   - Implement overlapping chunks for context preservation
   - Extract meaningful chunk boundaries (sentence/paragraph breaks)
   - Preserve timestamp information in chunks
   - Handle various transcript formats and quality levels

3. **Embedding Generation and Storage** (2 hours)
   - Integrate HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2)
   - Generate embeddings for transcript chunks
   - Store embeddings with metadata in ChromaDB
   - Implement batch processing for large transcripts
   - Add progress tracking for embedding generation

### Task 4.6.2: RAG Retrieval System (8 hours)

#### Subtasks:
1. **Semantic Search Implementation** (3 hours)
   - Implement similarity search across video chunks
   - Add relevance scoring and ranking algorithms
   - Configure search parameters (number of results, similarity threshold)
   - Handle edge cases (no relevant chunks, low similarity scores)
   - Test search quality with various question types

2. **Context Building Service** (2 hours)
   - Aggregate retrieved chunks into coherent context
   - Implement context window management for AI models
   - Preserve chunk ordering and timestamp information
   - Add context summarization for long retrievals
   - Handle overlapping chunks and deduplication

3. **Source Attribution System** (2 hours)
   - Link retrieved chunks to specific timestamps
   - Generate clickable timestamp references `[00:05:23]`
   - Create YouTube deep links for timestamp navigation
   - Implement source verification and quality checks
   - Add confidence scoring for source attribution

4. **RAG Response Generation** (1 hour)
   - Integrate DeepSeek AI service for response generation
   - Create RAG-specific prompts with context and question
   - Format responses with proper source citations
   - Handle cases where no relevant context is found
   - Add response quality validation

### Task 4.6.3: Chat Interface Implementation (4 hours)

#### Subtasks:
1. **Chat Frontend Component** (2 hours)
   - Create interactive chat interface with message history
   - Implement typing indicators and loading states
   - Add timestamp link rendering and click handling
   - Design responsive chat layout for video summary pages
   - Add keyboard shortcuts and accessibility features

2. **Chat Session Management** (1 hour)
   - Implement persistent chat sessions linked to videos
   - Add session creation, saving, and loading
   - Create chat session list and management interface
   - Handle session state and conversation context
   - Add session export and sharing functionality

3. **Follow-up Question System** (1 hour)
   - Generate AI-powered follow-up question suggestions
   - Base suggestions on video content and conversation context
   - Display suggested questions as clickable options
   - Track suggestion effectiveness and user engagement
   - Add customizable suggestion preferences

### Task 4.6.4: API Integration and Enhancement (2 hours)

#### Subtasks:
1. **RAG API Endpoints** (1 hour)
   - `POST /api/rag/chat/{video_id}` - Ask question about specific video
   - `GET /api/rag/sessions/{user_id}` - Get user's chat sessions
   - `POST /api/rag/sessions/{session_id}/export` - Export conversation
   - `GET /api/rag/suggestions/{video_id}` - Get follow-up suggestions
   - Add comprehensive error handling and validation

2. **Performance Optimization** (1 hour)
   - Implement caching for frequent questions and responses
   - Add batch processing for multiple questions
   - Optimize ChromaDB queries and connection management
   - Add response streaming for long AI responses
   - Monitor and optimize response times and resource usage

## Data Models

### RAG Chat Models

```python
from pydantic import BaseModel
from typing import List, Dict, Optional, Any
from datetime import datetime
from enum import Enum

class MessageType(str, Enum):
    USER = "user"
    ASSISTANT = "assistant"
    SYSTEM = "system"

class SourceReference(BaseModel):
    chunk_id: str
    timestamp: int  # seconds
    timestamp_formatted: str  # [HH:MM:SS]
    youtube_link: str
    chunk_text: str
    relevance_score: float

class ChatMessage(BaseModel):
    id: str
    message_type: MessageType
    content: str
    sources: List[SourceReference]
    processing_time_seconds: float
    created_at: datetime

class ChatSession(BaseModel):
    id: str
    user_id: str
    video_id: str
    summary_id: str
    session_name: str
    messages: List[ChatMessage]
    total_messages: int
    is_active: bool
    created_at: datetime
    updated_at: datetime

class ChatRequest(BaseModel):
    video_id: str
    question: str
    session_id: Optional[str] = None
    include_context: bool = True
    max_sources: int = 5

class ChatResponse(BaseModel):
    session_id: str
    message: ChatMessage
    follow_up_suggestions: List[str]
    context_retrieved: bool
    total_chunks_searched: int

class RAGAnalytics(BaseModel):
    question: str
    retrieval_count: int
    relevance_scores: List[float]
    response_quality_score: float
    processing_time_seconds: float
    user_feedback: Optional[int] = None
```

## Testing Strategy

### Unit Tests
- **ChromaDB Integration**: Connection, storage, and retrieval operations
- **Transcript Chunking**: Segmentation quality and metadata preservation
- **Embedding Generation**: Vector quality and consistency
- **Semantic Search**: Relevance and ranking accuracy
- **Source Attribution**: Timestamp accuracy and link generation

### Integration Tests
- **RAG Pipeline**: End-to-end question answering workflow
- **Chat API**: All chat and session management endpoints
- **Frontend Integration**: Chat interface functionality and state management
- **Database Operations**: Session and message persistence

### Quality Assurance Tests
- **Answer Relevance**: Semantic accuracy of responses to questions
- **Source Attribution**: Timestamp precision and link functionality
- **Response Quality**: Coherence and helpfulness of AI responses
- **Performance**: Response time and resource usage under load

## API Specification

### RAG Chat Endpoints

```yaml
/api/rag/chat/{video_id}:
  post:
    summary: Ask question about video content using RAG
    parameters:
      - name: video_id
        in: path
        required: true
        schema:
          type: string
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ChatRequest'
    responses:
      200:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatResponse'

/api/rag/sessions/{user_id}:
  get:
    summary: Get user's chat sessions
    parameters:
      - name: user_id
        in: path
        required: true
        schema:
          type: string
      - name: active_only
        in: query
        schema:
          type: boolean
          default: true
    responses:
      200:
        content:
          application/json:
            schema:
              type: array
              items:
                $ref: '#/components/schemas/ChatSession'

/api/rag/embeddings/{video_id}/generate:
  post:
    summary: Generate embeddings for video transcript
    parameters:
      - name: video_id
        in: path
        required: true
        schema:
          type: string
    responses:
      202:
        content:
          application/json:
            schema:
              type: object
              properties:
                job_id:
                  type: string
                status:
                  type: string
                estimated_completion:
                  type: string
                  format: date-time
```

## Success Criteria

### Functional Requirements ✅
- [ ] ChromaDB stores transcript embeddings with timestamp metadata
- [ ] Semantic search retrieves relevant content chunks for user questions
- [ ] Chat interface provides real-time Q&A with timestamp source references
- [ ] DeepSeek AI generates contextual responses using retrieved chunks
- [ ] Follow-up question suggestions based on video content
- [ ] Persistent chat sessions linked to specific videos

### Quality Requirements ✅
- [ ] Answer relevance >85% for factual questions about video content
- [ ] Timestamp references accurate within 10-second tolerance
- [ ] Source attribution clearly links responses to specific video segments
- [ ] Response quality maintains conversation context across messages
- [ ] Follow-up suggestions are relevant and engaging
- [ ] Chat interface provides smooth user experience with loading states

### Performance Requirements ✅
- [ ] Question answering response time under 8 seconds
- [ ] ChromaDB search completes in under 2 seconds
- [ ] Embedding generation processes 1-hour video in under 5 minutes
- [ ] Chat interface supports concurrent conversations without degradation
- [ ] Memory usage remains stable during long conversation sessions

## Implementation Notes

### ChromaDB Integration
- Use existing patterns from `/tests/framework-comparison/test_langgraph_chromadb.py`
- Implement HuggingFace embeddings for local processing (no API dependencies)
- Configure persistent storage in `./data/chromadb_rag/` directory
- Use collection per video or organized by user/topic as needed

### Transcript Chunking Strategy
- Create semantic chunks of 200-400 words with 50-word overlap
- Preserve sentence boundaries and paragraph structure
- Maintain timestamp ranges for each chunk
- Include video context (title, channel) in chunk metadata

### RAG Response Pattern
- Retrieve 3-5 most relevant chunks for context
- Include source timestamps in response format: "According to the video at [00:05:23], ..."
- Provide YouTube deep links for timestamp navigation
- Handle cases where no relevant content is found gracefully

### DeepSeek Integration
- Use DeepSeek API for response generation (per user requirement: no Anthropic)
- Configure appropriate model parameters for conversational responses
- Implement cost tracking and usage monitoring
- Add response quality scoring and feedback collection

## Risk Mitigation

### High Risk: Answer Quality and Relevance
- **Risk**: RAG responses may be generic or miss important context
- **Mitigation**: Quality scoring, user feedback collection, continuous prompt optimization

### Medium Risk: Timestamp Accuracy
- **Risk**: Source timestamps may not accurately reflect quoted content
- **Mitigation**: Chunk boundary validation, timestamp verification, user correction system

### Medium Risk: Performance with Large Videos
- **Risk**: Long videos may cause slow embedding generation and search
- **Mitigation**: Batch processing, progress tracking, optimized chunking strategies

---

**Story Owner**: Development Team
**Architecture Reference**: BMad Method Epic-Story Structure
**Implementation Status**: Ready for Development
**Last Updated**: 2025-08-27