16 KiB
Story 4.6: RAG-Powered Video Chat with ChromaDB
Story Overview
Story ID: 4.6
Epic: 4 - Advanced Intelligence & Developer Platform
Title: RAG-Powered Video Chat with ChromaDB
Status: 📋 READY FOR IMPLEMENTATION
Priority: Medium
Goal: Implement a RAG (Retrieval Augmented Generation) chatbot interface using ChromaDB for semantic search, enabling users to have interactive Q&A conversations with video content using precise timestamp source references.
Value Proposition: Transform passive video consumption into interactive content exploration, allowing users to ask specific questions about video content and receive precise answers with exact timestamp references for verification.
Dependencies:
- ✅ Story 4.4 (Custom AI Models) for AI service infrastructure
- ✅ Existing ChromaDB integration patterns from
/tests/framework-comparison/test_langgraph_chromadb.py - ✅ Transcript extraction system
Estimated Effort: 20 hours
Technical Requirements
Core Features
1. ChromaDB Vector Database
- Semantic Transcript Chunking: Split transcripts into meaningful chunks with overlap
- Embedding Storage: Generate and store embeddings for all transcript segments
- Metadata Preservation: Maintain timestamp, video ID, and section information
- Vector Search: Semantic similarity search across transcript content
- Collection Management: Organize embeddings by video, user, or topic
2. RAG Implementation
- Context Retrieval: Fetch relevant transcript chunks based on user questions
- Retrieved Chunks: Use existing test patterns from
/tests/framework-comparison/ - Context Window: Optimize context size for AI model limits
- Relevance Scoring: Rank retrieved chunks by semantic relevance
- Source Attribution: Maintain clear connection between chunks and timestamps
3. Chat Interface
- Real-time Q&A: Interactive chat interface for video-specific questions
- Timestamp References: Every response includes source timestamps like
[00:05:23] - DeepSeek Integration: AI responses using DeepSeek models (no Anthropic per user requirements)
- Context Awareness: Maintain conversation context and follow-up questions
- Visual Design: Clean chat interface integrated with video summary page
4. Enhanced Features
- Follow-up Suggestions: AI-generated follow-up questions based on content
- Conversation History: Persistent chat sessions linked to video summaries
- Export Conversations: Save Q&A sessions as part of video documentation
- Multi-Video Chat: Ask questions across multiple videos in a playlist
Technical Architecture
RAG System Components
class RAGService:
def __init__(self):
self.vector_db = ChromaVectorDB()
self.embeddings = HuggingFaceEmbeddings() # Local embeddings
self.ai_service = DeepSeekService()
self.chunk_processor = TranscriptChunker()
async def process_video_for_rag(self, video_id: str, transcript: str) -> bool:
# Chunk transcript into semantic segments
# Generate embeddings for each chunk
# Store in ChromaDB with metadata
# Return success status
async def ask_question(self, video_id: str, question: str, chat_history: List[ChatMessage]) -> ChatResponse:
# Retrieve relevant chunks using semantic search
# Build context from retrieved chunks
# Generate response with DeepSeek
# Format response with timestamp references
Database Schema Extensions
-- Chat sessions for persistent conversations
CREATE TABLE chat_sessions (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
video_id VARCHAR(20),
summary_id UUID REFERENCES summaries(id),
session_name VARCHAR(200),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total_messages INTEGER DEFAULT 0,
is_active BOOLEAN DEFAULT TRUE
);
-- Individual chat messages
CREATE TABLE chat_messages (
id UUID PRIMARY KEY,
session_id UUID REFERENCES chat_sessions(id),
message_type VARCHAR(20), -- 'user', 'assistant', 'system'
content TEXT,
sources JSONB, -- Array of {chunk_id, timestamp, relevance_score}
processing_time_seconds FLOAT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Vector embeddings for RAG (ChromaDB metadata reference)
CREATE TABLE video_chunks (
id UUID PRIMARY KEY,
video_id VARCHAR(20),
chunk_index INTEGER,
chunk_text TEXT,
start_timestamp INTEGER, -- seconds
end_timestamp INTEGER,
word_count INTEGER,
embedding_id VARCHAR(100), -- ChromaDB document ID
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- RAG performance tracking
CREATE TABLE rag_analytics (
id UUID PRIMARY KEY,
video_id VARCHAR(20),
question TEXT,
retrieval_count INTEGER,
relevance_scores JSONB,
response_quality_score FLOAT,
user_feedback INTEGER, -- 1-5 rating
processing_time_seconds FLOAT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Implementation Tasks
Task 4.6.1: ChromaDB Vector Database Setup (6 hours)
Subtasks:
-
ChromaDB Configuration (2 hours)
- Set up ChromaDB client with persistent storage
- Configure collections for video transcripts
- Implement collection naming and organization strategy
- Add cleanup and maintenance procedures
- Test database initialization and connection
-
Transcript Chunking Service (2 hours)
- Create intelligent transcript segmentation algorithm
- Implement overlapping chunks for context preservation
- Extract meaningful chunk boundaries (sentence/paragraph breaks)
- Preserve timestamp information in chunks
- Handle various transcript formats and quality levels
-
Embedding Generation and Storage (2 hours)
- Integrate HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2)
- Generate embeddings for transcript chunks
- Store embeddings with metadata in ChromaDB
- Implement batch processing for large transcripts
- Add progress tracking for embedding generation
Task 4.6.2: RAG Retrieval System (8 hours)
Subtasks:
-
Semantic Search Implementation (3 hours)
- Implement similarity search across video chunks
- Add relevance scoring and ranking algorithms
- Configure search parameters (number of results, similarity threshold)
- Handle edge cases (no relevant chunks, low similarity scores)
- Test search quality with various question types
-
Context Building Service (2 hours)
- Aggregate retrieved chunks into coherent context
- Implement context window management for AI models
- Preserve chunk ordering and timestamp information
- Add context summarization for long retrievals
- Handle overlapping chunks and deduplication
-
Source Attribution System (2 hours)
- Link retrieved chunks to specific timestamps
- Generate clickable timestamp references
[00:05:23] - Create YouTube deep links for timestamp navigation
- Implement source verification and quality checks
- Add confidence scoring for source attribution
-
RAG Response Generation (1 hour)
- Integrate DeepSeek AI service for response generation
- Create RAG-specific prompts with context and question
- Format responses with proper source citations
- Handle cases where no relevant context is found
- Add response quality validation
Task 4.6.3: Chat Interface Implementation (4 hours)
Subtasks:
-
Chat Frontend Component (2 hours)
- Create interactive chat interface with message history
- Implement typing indicators and loading states
- Add timestamp link rendering and click handling
- Design responsive chat layout for video summary pages
- Add keyboard shortcuts and accessibility features
-
Chat Session Management (1 hour)
- Implement persistent chat sessions linked to videos
- Add session creation, saving, and loading
- Create chat session list and management interface
- Handle session state and conversation context
- Add session export and sharing functionality
-
Follow-up Question System (1 hour)
- Generate AI-powered follow-up question suggestions
- Base suggestions on video content and conversation context
- Display suggested questions as clickable options
- Track suggestion effectiveness and user engagement
- Add customizable suggestion preferences
Task 4.6.4: API Integration and Enhancement (2 hours)
Subtasks:
-
RAG API Endpoints (1 hour)
POST /api/rag/chat/{video_id}- Ask question about specific videoGET /api/rag/sessions/{user_id}- Get user's chat sessionsPOST /api/rag/sessions/{session_id}/export- Export conversationGET /api/rag/suggestions/{video_id}- Get follow-up suggestions- Add comprehensive error handling and validation
-
Performance Optimization (1 hour)
- Implement caching for frequent questions and responses
- Add batch processing for multiple questions
- Optimize ChromaDB queries and connection management
- Add response streaming for long AI responses
- Monitor and optimize response times and resource usage
Data Models
RAG Chat Models
from pydantic import BaseModel
from typing import List, Dict, Optional, Any
from datetime import datetime
from enum import Enum
class MessageType(str, Enum):
USER = "user"
ASSISTANT = "assistant"
SYSTEM = "system"
class SourceReference(BaseModel):
chunk_id: str
timestamp: int # seconds
timestamp_formatted: str # [HH:MM:SS]
youtube_link: str
chunk_text: str
relevance_score: float
class ChatMessage(BaseModel):
id: str
message_type: MessageType
content: str
sources: List[SourceReference]
processing_time_seconds: float
created_at: datetime
class ChatSession(BaseModel):
id: str
user_id: str
video_id: str
summary_id: str
session_name: str
messages: List[ChatMessage]
total_messages: int
is_active: bool
created_at: datetime
updated_at: datetime
class ChatRequest(BaseModel):
video_id: str
question: str
session_id: Optional[str] = None
include_context: bool = True
max_sources: int = 5
class ChatResponse(BaseModel):
session_id: str
message: ChatMessage
follow_up_suggestions: List[str]
context_retrieved: bool
total_chunks_searched: int
class RAGAnalytics(BaseModel):
question: str
retrieval_count: int
relevance_scores: List[float]
response_quality_score: float
processing_time_seconds: float
user_feedback: Optional[int] = None
Testing Strategy
Unit Tests
- ChromaDB Integration: Connection, storage, and retrieval operations
- Transcript Chunking: Segmentation quality and metadata preservation
- Embedding Generation: Vector quality and consistency
- Semantic Search: Relevance and ranking accuracy
- Source Attribution: Timestamp accuracy and link generation
Integration Tests
- RAG Pipeline: End-to-end question answering workflow
- Chat API: All chat and session management endpoints
- Frontend Integration: Chat interface functionality and state management
- Database Operations: Session and message persistence
Quality Assurance Tests
- Answer Relevance: Semantic accuracy of responses to questions
- Source Attribution: Timestamp precision and link functionality
- Response Quality: Coherence and helpfulness of AI responses
- Performance: Response time and resource usage under load
API Specification
RAG Chat Endpoints
/api/rag/chat/{video_id}:
post:
summary: Ask question about video content using RAG
parameters:
- name: video_id
in: path
required: true
schema:
type: string
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/ChatRequest'
responses:
200:
content:
application/json:
schema:
$ref: '#/components/schemas/ChatResponse'
/api/rag/sessions/{user_id}:
get:
summary: Get user's chat sessions
parameters:
- name: user_id
in: path
required: true
schema:
type: string
- name: active_only
in: query
schema:
type: boolean
default: true
responses:
200:
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/ChatSession'
/api/rag/embeddings/{video_id}/generate:
post:
summary: Generate embeddings for video transcript
parameters:
- name: video_id
in: path
required: true
schema:
type: string
responses:
202:
content:
application/json:
schema:
type: object
properties:
job_id:
type: string
status:
type: string
estimated_completion:
type: string
format: date-time
Success Criteria
Functional Requirements ✅
- ChromaDB stores transcript embeddings with timestamp metadata
- Semantic search retrieves relevant content chunks for user questions
- Chat interface provides real-time Q&A with timestamp source references
- DeepSeek AI generates contextual responses using retrieved chunks
- Follow-up question suggestions based on video content
- Persistent chat sessions linked to specific videos
Quality Requirements ✅
- Answer relevance >85% for factual questions about video content
- Timestamp references accurate within 10-second tolerance
- Source attribution clearly links responses to specific video segments
- Response quality maintains conversation context across messages
- Follow-up suggestions are relevant and engaging
- Chat interface provides smooth user experience with loading states
Performance Requirements ✅
- Question answering response time under 8 seconds
- ChromaDB search completes in under 2 seconds
- Embedding generation processes 1-hour video in under 5 minutes
- Chat interface supports concurrent conversations without degradation
- Memory usage remains stable during long conversation sessions
Implementation Notes
ChromaDB Integration
- Use existing patterns from
/tests/framework-comparison/test_langgraph_chromadb.py - Implement HuggingFace embeddings for local processing (no API dependencies)
- Configure persistent storage in
./data/chromadb_rag/directory - Use collection per video or organized by user/topic as needed
Transcript Chunking Strategy
- Create semantic chunks of 200-400 words with 50-word overlap
- Preserve sentence boundaries and paragraph structure
- Maintain timestamp ranges for each chunk
- Include video context (title, channel) in chunk metadata
RAG Response Pattern
- Retrieve 3-5 most relevant chunks for context
- Include source timestamps in response format: "According to the video at [00:05:23], ..."
- Provide YouTube deep links for timestamp navigation
- Handle cases where no relevant content is found gracefully
DeepSeek Integration
- Use DeepSeek API for response generation (per user requirement: no Anthropic)
- Configure appropriate model parameters for conversational responses
- Implement cost tracking and usage monitoring
- Add response quality scoring and feedback collection
Risk Mitigation
High Risk: Answer Quality and Relevance
- Risk: RAG responses may be generic or miss important context
- Mitigation: Quality scoring, user feedback collection, continuous prompt optimization
Medium Risk: Timestamp Accuracy
- Risk: Source timestamps may not accurately reflect quoted content
- Mitigation: Chunk boundary validation, timestamp verification, user correction system
Medium Risk: Performance with Large Videos
- Risk: Long videos may cause slow embedding generation and search
- Mitigation: Batch processing, progress tracking, optimized chunking strategies
Story Owner: Development Team
Architecture Reference: BMad Method Epic-Story Structure
Implementation Status: Ready for Development
Last Updated: 2025-08-27