16 KiB

Raw Blame History

Story 4.6: RAG-Powered Video Chat with ChromaDB

Story Overview

Story ID: 4.6
Epic: 4 - Advanced Intelligence & Developer Platform
Title: RAG-Powered Video Chat with ChromaDB
Status: 📋 READY FOR IMPLEMENTATION
Priority: Medium

Goal: Implement a RAG (Retrieval Augmented Generation) chatbot interface using ChromaDB for semantic search, enabling users to have interactive Q&A conversations with video content using precise timestamp source references.

Value Proposition: Transform passive video consumption into interactive content exploration, allowing users to ask specific questions about video content and receive precise answers with exact timestamp references for verification.

Dependencies:

✅ Story 4.4 (Custom AI Models) for AI service infrastructure
✅ Existing ChromaDB integration patterns from /tests/framework-comparison/test_langgraph_chromadb.py
✅ Transcript extraction system

Estimated Effort: 20 hours

Technical Requirements

Core Features

1. ChromaDB Vector Database

Semantic Transcript Chunking: Split transcripts into meaningful chunks with overlap
Embedding Storage: Generate and store embeddings for all transcript segments
Metadata Preservation: Maintain timestamp, video ID, and section information
Vector Search: Semantic similarity search across transcript content
Collection Management: Organize embeddings by video, user, or topic

2. RAG Implementation

Context Retrieval: Fetch relevant transcript chunks based on user questions
Retrieved Chunks: Use existing test patterns from /tests/framework-comparison/
Context Window: Optimize context size for AI model limits
Relevance Scoring: Rank retrieved chunks by semantic relevance
Source Attribution: Maintain clear connection between chunks and timestamps

3. Chat Interface

Real-time Q&A: Interactive chat interface for video-specific questions
Timestamp References: Every response includes source timestamps like [00:05:23]
DeepSeek Integration: AI responses using DeepSeek models (no Anthropic per user requirements)
Context Awareness: Maintain conversation context and follow-up questions
Visual Design: Clean chat interface integrated with video summary page

4. Enhanced Features

Follow-up Suggestions: AI-generated follow-up questions based on content
Conversation History: Persistent chat sessions linked to video summaries
Export Conversations: Save Q&A sessions as part of video documentation
Multi-Video Chat: Ask questions across multiple videos in a playlist

Technical Architecture

RAG System Components

class RAGService:
    def __init__(self):
        self.vector_db = ChromaVectorDB()
        self.embeddings = HuggingFaceEmbeddings()  # Local embeddings
        self.ai_service = DeepSeekService()
        self.chunk_processor = TranscriptChunker()
    
    async def process_video_for_rag(self, video_id: str, transcript: str) -> bool:
        # Chunk transcript into semantic segments
        # Generate embeddings for each chunk
        # Store in ChromaDB with metadata
        # Return success status
    
    async def ask_question(self, video_id: str, question: str, chat_history: List[ChatMessage]) -> ChatResponse:
        # Retrieve relevant chunks using semantic search
        # Build context from retrieved chunks
        # Generate response with DeepSeek
        # Format response with timestamp references

Database Schema Extensions

-- Chat sessions for persistent conversations
CREATE TABLE chat_sessions (
    id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(id),
    video_id VARCHAR(20),
    summary_id UUID REFERENCES summaries(id),
    session_name VARCHAR(200),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_messages INTEGER DEFAULT 0,
    is_active BOOLEAN DEFAULT TRUE
);

-- Individual chat messages
CREATE TABLE chat_messages (
    id UUID PRIMARY KEY,
    session_id UUID REFERENCES chat_sessions(id),
    message_type VARCHAR(20), -- 'user', 'assistant', 'system'
    content TEXT,
    sources JSONB, -- Array of {chunk_id, timestamp, relevance_score}
    processing_time_seconds FLOAT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Vector embeddings for RAG (ChromaDB metadata reference)
CREATE TABLE video_chunks (
    id UUID PRIMARY KEY,
    video_id VARCHAR(20),
    chunk_index INTEGER,
    chunk_text TEXT,
    start_timestamp INTEGER, -- seconds
    end_timestamp INTEGER,
    word_count INTEGER,
    embedding_id VARCHAR(100), -- ChromaDB document ID
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- RAG performance tracking
CREATE TABLE rag_analytics (
    id UUID PRIMARY KEY,
    video_id VARCHAR(20),
    question TEXT,
    retrieval_count INTEGER,
    relevance_scores JSONB,
    response_quality_score FLOAT,
    user_feedback INTEGER, -- 1-5 rating
    processing_time_seconds FLOAT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Implementation Tasks

Task 4.6.1: ChromaDB Vector Database Setup (6 hours)

Subtasks:

ChromaDB Configuration (2 hours)
- Set up ChromaDB client with persistent storage
- Configure collections for video transcripts
- Implement collection naming and organization strategy
- Add cleanup and maintenance procedures
- Test database initialization and connection
Transcript Chunking Service (2 hours)
- Create intelligent transcript segmentation algorithm
- Implement overlapping chunks for context preservation
- Extract meaningful chunk boundaries (sentence/paragraph breaks)
- Preserve timestamp information in chunks
- Handle various transcript formats and quality levels
Embedding Generation and Storage (2 hours)
- Integrate HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2)
- Generate embeddings for transcript chunks
- Store embeddings with metadata in ChromaDB
- Implement batch processing for large transcripts
- Add progress tracking for embedding generation

Task 4.6.2: RAG Retrieval System (8 hours)

Subtasks:

Semantic Search Implementation (3 hours)
- Implement similarity search across video chunks
- Add relevance scoring and ranking algorithms
- Configure search parameters (number of results, similarity threshold)
- Handle edge cases (no relevant chunks, low similarity scores)
- Test search quality with various question types
Context Building Service (2 hours)
- Aggregate retrieved chunks into coherent context
- Implement context window management for AI models
- Preserve chunk ordering and timestamp information
- Add context summarization for long retrievals
- Handle overlapping chunks and deduplication
Source Attribution System (2 hours)
- Link retrieved chunks to specific timestamps
- Generate clickable timestamp references [00:05:23]
- Create YouTube deep links for timestamp navigation
- Implement source verification and quality checks
- Add confidence scoring for source attribution
RAG Response Generation (1 hour)
- Integrate DeepSeek AI service for response generation
- Create RAG-specific prompts with context and question
- Format responses with proper source citations
- Handle cases where no relevant context is found
- Add response quality validation

Task 4.6.3: Chat Interface Implementation (4 hours)

Subtasks:

Chat Frontend Component (2 hours)
- Create interactive chat interface with message history
- Implement typing indicators and loading states
- Add timestamp link rendering and click handling
- Design responsive chat layout for video summary pages
- Add keyboard shortcuts and accessibility features
Chat Session Management (1 hour)
- Implement persistent chat sessions linked to videos
- Add session creation, saving, and loading
- Create chat session list and management interface
- Handle session state and conversation context
- Add session export and sharing functionality
Follow-up Question System (1 hour)
- Generate AI-powered follow-up question suggestions
- Base suggestions on video content and conversation context
- Display suggested questions as clickable options
- Track suggestion effectiveness and user engagement
- Add customizable suggestion preferences

Task 4.6.4: API Integration and Enhancement (2 hours)

Subtasks:

RAG API Endpoints (1 hour)
- POST /api/rag/chat/{video_id} - Ask question about specific video
- GET /api/rag/sessions/{user_id} - Get user's chat sessions
- POST /api/rag/sessions/{session_id}/export - Export conversation
- GET /api/rag/suggestions/{video_id} - Get follow-up suggestions
- Add comprehensive error handling and validation
Performance Optimization (1 hour)
- Implement caching for frequent questions and responses
- Add batch processing for multiple questions
- Optimize ChromaDB queries and connection management
- Add response streaming for long AI responses
- Monitor and optimize response times and resource usage

Data Models

RAG Chat Models

from pydantic import BaseModel
from typing import List, Dict, Optional, Any
from datetime import datetime
from enum import Enum

class MessageType(str, Enum):
    USER = "user"
    ASSISTANT = "assistant"
    SYSTEM = "system"

class SourceReference(BaseModel):
    chunk_id: str
    timestamp: int  # seconds
    timestamp_formatted: str  # [HH:MM:SS]
    youtube_link: str
    chunk_text: str
    relevance_score: float

class ChatMessage(BaseModel):
    id: str
    message_type: MessageType
    content: str
    sources: List[SourceReference]
    processing_time_seconds: float
    created_at: datetime

class ChatSession(BaseModel):
    id: str
    user_id: str
    video_id: str
    summary_id: str
    session_name: str
    messages: List[ChatMessage]
    total_messages: int
    is_active: bool
    created_at: datetime
    updated_at: datetime

class ChatRequest(BaseModel):
    video_id: str
    question: str
    session_id: Optional[str] = None
    include_context: bool = True
    max_sources: int = 5

class ChatResponse(BaseModel):
    session_id: str
    message: ChatMessage
    follow_up_suggestions: List[str]
    context_retrieved: bool
    total_chunks_searched: int

class RAGAnalytics(BaseModel):
    question: str
    retrieval_count: int
    relevance_scores: List[float]
    response_quality_score: float
    processing_time_seconds: float
    user_feedback: Optional[int] = None

Testing Strategy

Unit Tests

ChromaDB Integration: Connection, storage, and retrieval operations
Transcript Chunking: Segmentation quality and metadata preservation
Embedding Generation: Vector quality and consistency
Semantic Search: Relevance and ranking accuracy
Source Attribution: Timestamp accuracy and link generation

Integration Tests

RAG Pipeline: End-to-end question answering workflow
Chat API: All chat and session management endpoints
Frontend Integration: Chat interface functionality and state management
Database Operations: Session and message persistence

Quality Assurance Tests

Answer Relevance: Semantic accuracy of responses to questions
Source Attribution: Timestamp precision and link functionality
Response Quality: Coherence and helpfulness of AI responses
Performance: Response time and resource usage under load

API Specification

RAG Chat Endpoints

/api/rag/chat/{video_id}:
  post:
    summary: Ask question about video content using RAG
    parameters:
      - name: video_id
        in: path
        required: true
        schema:
          type: string
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ChatRequest'
    responses:
      200:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatResponse'

/api/rag/sessions/{user_id}:
  get:
    summary: Get user's chat sessions
    parameters:
      - name: user_id
        in: path
        required: true
        schema:
          type: string
      - name: active_only
        in: query
        schema:
          type: boolean
          default: true
    responses:
      200:
        content:
          application/json:
            schema:
              type: array
              items:
                $ref: '#/components/schemas/ChatSession'

/api/rag/embeddings/{video_id}/generate:
  post:
    summary: Generate embeddings for video transcript
    parameters:
      - name: video_id
        in: path
        required: true
        schema:
          type: string
    responses:
      202:
        content:
          application/json:
            schema:
              type: object
              properties:
                job_id:
                  type: string
                status:
                  type: string
                estimated_completion:
                  type: string
                  format: date-time

Success Criteria

Functional Requirements ✅

ChromaDB stores transcript embeddings with timestamp metadata
Semantic search retrieves relevant content chunks for user questions
Chat interface provides real-time Q&A with timestamp source references
DeepSeek AI generates contextual responses using retrieved chunks
Follow-up question suggestions based on video content
Persistent chat sessions linked to specific videos

Quality Requirements ✅

Answer relevance >85% for factual questions about video content
Timestamp references accurate within 10-second tolerance
Source attribution clearly links responses to specific video segments
Response quality maintains conversation context across messages
Follow-up suggestions are relevant and engaging
Chat interface provides smooth user experience with loading states

Performance Requirements ✅

Question answering response time under 8 seconds
ChromaDB search completes in under 2 seconds
Embedding generation processes 1-hour video in under 5 minutes
Chat interface supports concurrent conversations without degradation
Memory usage remains stable during long conversation sessions

Implementation Notes

ChromaDB Integration

Use existing patterns from /tests/framework-comparison/test_langgraph_chromadb.py
Implement HuggingFace embeddings for local processing (no API dependencies)
Configure persistent storage in ./data/chromadb_rag/ directory
Use collection per video or organized by user/topic as needed

Transcript Chunking Strategy

Create semantic chunks of 200-400 words with 50-word overlap
Preserve sentence boundaries and paragraph structure
Maintain timestamp ranges for each chunk
Include video context (title, channel) in chunk metadata

RAG Response Pattern

Retrieve 3-5 most relevant chunks for context
Include source timestamps in response format: "According to the video at [00:05:23], ..."
Provide YouTube deep links for timestamp navigation
Handle cases where no relevant content is found gracefully

DeepSeek Integration

Use DeepSeek API for response generation (per user requirement: no Anthropic)
Configure appropriate model parameters for conversational responses
Implement cost tracking and usage monitoring
Add response quality scoring and feedback collection

Risk Mitigation

High Risk: Answer Quality and Relevance

Risk: RAG responses may be generic or miss important context
Mitigation: Quality scoring, user feedback collection, continuous prompt optimization

Medium Risk: Timestamp Accuracy

Risk: Source timestamps may not accurately reflect quoted content
Mitigation: Chunk boundary validation, timestamp verification, user correction system

Medium Risk: Performance with Large Videos

Risk: Long videos may cause slow embedding generation and search
Mitigation: Batch processing, progress tracking, optimized chunking strategies

Story Owner: Development Team
Architecture Reference: BMad Method Epic-Story Structure
Implementation Status: Ready for Development
Last Updated: 2025-08-27

16 KiB Raw Blame History

Story 4.6: RAG-Powered Video Chat with ChromaDB

Story Overview

Technical Requirements

Core Features

1. ChromaDB Vector Database

2. RAG Implementation

3. Chat Interface

4. Enhanced Features

Technical Architecture

RAG System Components

Database Schema Extensions

Implementation Tasks

Task 4.6.1: ChromaDB Vector Database Setup (6 hours)

Subtasks:

Task 4.6.2: RAG Retrieval System (8 hours)

Subtasks:

Task 4.6.3: Chat Interface Implementation (4 hours)

Subtasks:

Task 4.6.4: API Integration and Enhancement (2 hours)

Subtasks:

Data Models

RAG Chat Models

Testing Strategy

Unit Tests

Integration Tests

Quality Assurance Tests

API Specification

RAG Chat Endpoints

Success Criteria

Functional Requirements ✅

Quality Requirements ✅

Performance Requirements ✅

Implementation Notes

ChromaDB Integration

Transcript Chunking Strategy

RAG Response Pattern

DeepSeek Integration

Risk Mitigation

High Risk: Answer Quality and Relevance

Medium Risk: Timestamp Accuracy

Medium Risk: Performance with Large Videos

16 KiB

Raw Blame History