youtube-summarizer/docs/stories/4.6.rag-powered-video-chat.md

462 lines
16 KiB
Markdown

# Story 4.6: RAG-Powered Video Chat with ChromaDB
## Story Overview
**Story ID**: 4.6
**Epic**: 4 - Advanced Intelligence & Developer Platform
**Title**: RAG-Powered Video Chat with ChromaDB
**Status**: 📋 READY FOR IMPLEMENTATION
**Priority**: Medium
**Goal**: Implement a RAG (Retrieval Augmented Generation) chatbot interface using ChromaDB for semantic search, enabling users to have interactive Q&A conversations with video content using precise timestamp source references.
**Value Proposition**: Transform passive video consumption into interactive content exploration, allowing users to ask specific questions about video content and receive precise answers with exact timestamp references for verification.
**Dependencies**:
- ✅ Story 4.4 (Custom AI Models) for AI service infrastructure
- ✅ Existing ChromaDB integration patterns from `/tests/framework-comparison/test_langgraph_chromadb.py`
- ✅ Transcript extraction system
**Estimated Effort**: 20 hours
## Technical Requirements
### Core Features
#### 1. ChromaDB Vector Database
- **Semantic Transcript Chunking**: Split transcripts into meaningful chunks with overlap
- **Embedding Storage**: Generate and store embeddings for all transcript segments
- **Metadata Preservation**: Maintain timestamp, video ID, and section information
- **Vector Search**: Semantic similarity search across transcript content
- **Collection Management**: Organize embeddings by video, user, or topic
#### 2. RAG Implementation
- **Context Retrieval**: Fetch relevant transcript chunks based on user questions
- **Retrieved Chunks**: Use existing test patterns from `/tests/framework-comparison/`
- **Context Window**: Optimize context size for AI model limits
- **Relevance Scoring**: Rank retrieved chunks by semantic relevance
- **Source Attribution**: Maintain clear connection between chunks and timestamps
#### 3. Chat Interface
- **Real-time Q&A**: Interactive chat interface for video-specific questions
- **Timestamp References**: Every response includes source timestamps like `[00:05:23]`
- **DeepSeek Integration**: AI responses using DeepSeek models (no Anthropic per user requirements)
- **Context Awareness**: Maintain conversation context and follow-up questions
- **Visual Design**: Clean chat interface integrated with video summary page
#### 4. Enhanced Features
- **Follow-up Suggestions**: AI-generated follow-up questions based on content
- **Conversation History**: Persistent chat sessions linked to video summaries
- **Export Conversations**: Save Q&A sessions as part of video documentation
- **Multi-Video Chat**: Ask questions across multiple videos in a playlist
### Technical Architecture
#### RAG System Components
```python
class RAGService:
def __init__(self):
self.vector_db = ChromaVectorDB()
self.embeddings = HuggingFaceEmbeddings() # Local embeddings
self.ai_service = DeepSeekService()
self.chunk_processor = TranscriptChunker()
async def process_video_for_rag(self, video_id: str, transcript: str) -> bool:
# Chunk transcript into semantic segments
# Generate embeddings for each chunk
# Store in ChromaDB with metadata
# Return success status
async def ask_question(self, video_id: str, question: str, chat_history: List[ChatMessage]) -> ChatResponse:
# Retrieve relevant chunks using semantic search
# Build context from retrieved chunks
# Generate response with DeepSeek
# Format response with timestamp references
```
#### Database Schema Extensions
```sql
-- Chat sessions for persistent conversations
CREATE TABLE chat_sessions (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
video_id VARCHAR(20),
summary_id UUID REFERENCES summaries(id),
session_name VARCHAR(200),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total_messages INTEGER DEFAULT 0,
is_active BOOLEAN DEFAULT TRUE
);
-- Individual chat messages
CREATE TABLE chat_messages (
id UUID PRIMARY KEY,
session_id UUID REFERENCES chat_sessions(id),
message_type VARCHAR(20), -- 'user', 'assistant', 'system'
content TEXT,
sources JSONB, -- Array of {chunk_id, timestamp, relevance_score}
processing_time_seconds FLOAT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Vector embeddings for RAG (ChromaDB metadata reference)
CREATE TABLE video_chunks (
id UUID PRIMARY KEY,
video_id VARCHAR(20),
chunk_index INTEGER,
chunk_text TEXT,
start_timestamp INTEGER, -- seconds
end_timestamp INTEGER,
word_count INTEGER,
embedding_id VARCHAR(100), -- ChromaDB document ID
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- RAG performance tracking
CREATE TABLE rag_analytics (
id UUID PRIMARY KEY,
video_id VARCHAR(20),
question TEXT,
retrieval_count INTEGER,
relevance_scores JSONB,
response_quality_score FLOAT,
user_feedback INTEGER, -- 1-5 rating
processing_time_seconds FLOAT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
## Implementation Tasks
### Task 4.6.1: ChromaDB Vector Database Setup (6 hours)
#### Subtasks:
1. **ChromaDB Configuration** (2 hours)
- Set up ChromaDB client with persistent storage
- Configure collections for video transcripts
- Implement collection naming and organization strategy
- Add cleanup and maintenance procedures
- Test database initialization and connection
2. **Transcript Chunking Service** (2 hours)
- Create intelligent transcript segmentation algorithm
- Implement overlapping chunks for context preservation
- Extract meaningful chunk boundaries (sentence/paragraph breaks)
- Preserve timestamp information in chunks
- Handle various transcript formats and quality levels
3. **Embedding Generation and Storage** (2 hours)
- Integrate HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2)
- Generate embeddings for transcript chunks
- Store embeddings with metadata in ChromaDB
- Implement batch processing for large transcripts
- Add progress tracking for embedding generation
### Task 4.6.2: RAG Retrieval System (8 hours)
#### Subtasks:
1. **Semantic Search Implementation** (3 hours)
- Implement similarity search across video chunks
- Add relevance scoring and ranking algorithms
- Configure search parameters (number of results, similarity threshold)
- Handle edge cases (no relevant chunks, low similarity scores)
- Test search quality with various question types
2. **Context Building Service** (2 hours)
- Aggregate retrieved chunks into coherent context
- Implement context window management for AI models
- Preserve chunk ordering and timestamp information
- Add context summarization for long retrievals
- Handle overlapping chunks and deduplication
3. **Source Attribution System** (2 hours)
- Link retrieved chunks to specific timestamps
- Generate clickable timestamp references `[00:05:23]`
- Create YouTube deep links for timestamp navigation
- Implement source verification and quality checks
- Add confidence scoring for source attribution
4. **RAG Response Generation** (1 hour)
- Integrate DeepSeek AI service for response generation
- Create RAG-specific prompts with context and question
- Format responses with proper source citations
- Handle cases where no relevant context is found
- Add response quality validation
### Task 4.6.3: Chat Interface Implementation (4 hours)
#### Subtasks:
1. **Chat Frontend Component** (2 hours)
- Create interactive chat interface with message history
- Implement typing indicators and loading states
- Add timestamp link rendering and click handling
- Design responsive chat layout for video summary pages
- Add keyboard shortcuts and accessibility features
2. **Chat Session Management** (1 hour)
- Implement persistent chat sessions linked to videos
- Add session creation, saving, and loading
- Create chat session list and management interface
- Handle session state and conversation context
- Add session export and sharing functionality
3. **Follow-up Question System** (1 hour)
- Generate AI-powered follow-up question suggestions
- Base suggestions on video content and conversation context
- Display suggested questions as clickable options
- Track suggestion effectiveness and user engagement
- Add customizable suggestion preferences
### Task 4.6.4: API Integration and Enhancement (2 hours)
#### Subtasks:
1. **RAG API Endpoints** (1 hour)
- `POST /api/rag/chat/{video_id}` - Ask question about specific video
- `GET /api/rag/sessions/{user_id}` - Get user's chat sessions
- `POST /api/rag/sessions/{session_id}/export` - Export conversation
- `GET /api/rag/suggestions/{video_id}` - Get follow-up suggestions
- Add comprehensive error handling and validation
2. **Performance Optimization** (1 hour)
- Implement caching for frequent questions and responses
- Add batch processing for multiple questions
- Optimize ChromaDB queries and connection management
- Add response streaming for long AI responses
- Monitor and optimize response times and resource usage
## Data Models
### RAG Chat Models
```python
from pydantic import BaseModel
from typing import List, Dict, Optional, Any
from datetime import datetime
from enum import Enum
class MessageType(str, Enum):
USER = "user"
ASSISTANT = "assistant"
SYSTEM = "system"
class SourceReference(BaseModel):
chunk_id: str
timestamp: int # seconds
timestamp_formatted: str # [HH:MM:SS]
youtube_link: str
chunk_text: str
relevance_score: float
class ChatMessage(BaseModel):
id: str
message_type: MessageType
content: str
sources: List[SourceReference]
processing_time_seconds: float
created_at: datetime
class ChatSession(BaseModel):
id: str
user_id: str
video_id: str
summary_id: str
session_name: str
messages: List[ChatMessage]
total_messages: int
is_active: bool
created_at: datetime
updated_at: datetime
class ChatRequest(BaseModel):
video_id: str
question: str
session_id: Optional[str] = None
include_context: bool = True
max_sources: int = 5
class ChatResponse(BaseModel):
session_id: str
message: ChatMessage
follow_up_suggestions: List[str]
context_retrieved: bool
total_chunks_searched: int
class RAGAnalytics(BaseModel):
question: str
retrieval_count: int
relevance_scores: List[float]
response_quality_score: float
processing_time_seconds: float
user_feedback: Optional[int] = None
```
## Testing Strategy
### Unit Tests
- **ChromaDB Integration**: Connection, storage, and retrieval operations
- **Transcript Chunking**: Segmentation quality and metadata preservation
- **Embedding Generation**: Vector quality and consistency
- **Semantic Search**: Relevance and ranking accuracy
- **Source Attribution**: Timestamp accuracy and link generation
### Integration Tests
- **RAG Pipeline**: End-to-end question answering workflow
- **Chat API**: All chat and session management endpoints
- **Frontend Integration**: Chat interface functionality and state management
- **Database Operations**: Session and message persistence
### Quality Assurance Tests
- **Answer Relevance**: Semantic accuracy of responses to questions
- **Source Attribution**: Timestamp precision and link functionality
- **Response Quality**: Coherence and helpfulness of AI responses
- **Performance**: Response time and resource usage under load
## API Specification
### RAG Chat Endpoints
```yaml
/api/rag/chat/{video_id}:
post:
summary: Ask question about video content using RAG
parameters:
- name: video_id
in: path
required: true
schema:
type: string
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/ChatRequest'
responses:
200:
content:
application/json:
schema:
$ref: '#/components/schemas/ChatResponse'
/api/rag/sessions/{user_id}:
get:
summary: Get user's chat sessions
parameters:
- name: user_id
in: path
required: true
schema:
type: string
- name: active_only
in: query
schema:
type: boolean
default: true
responses:
200:
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/ChatSession'
/api/rag/embeddings/{video_id}/generate:
post:
summary: Generate embeddings for video transcript
parameters:
- name: video_id
in: path
required: true
schema:
type: string
responses:
202:
content:
application/json:
schema:
type: object
properties:
job_id:
type: string
status:
type: string
estimated_completion:
type: string
format: date-time
```
## Success Criteria
### Functional Requirements ✅
- [ ] ChromaDB stores transcript embeddings with timestamp metadata
- [ ] Semantic search retrieves relevant content chunks for user questions
- [ ] Chat interface provides real-time Q&A with timestamp source references
- [ ] DeepSeek AI generates contextual responses using retrieved chunks
- [ ] Follow-up question suggestions based on video content
- [ ] Persistent chat sessions linked to specific videos
### Quality Requirements ✅
- [ ] Answer relevance >85% for factual questions about video content
- [ ] Timestamp references accurate within 10-second tolerance
- [ ] Source attribution clearly links responses to specific video segments
- [ ] Response quality maintains conversation context across messages
- [ ] Follow-up suggestions are relevant and engaging
- [ ] Chat interface provides smooth user experience with loading states
### Performance Requirements ✅
- [ ] Question answering response time under 8 seconds
- [ ] ChromaDB search completes in under 2 seconds
- [ ] Embedding generation processes 1-hour video in under 5 minutes
- [ ] Chat interface supports concurrent conversations without degradation
- [ ] Memory usage remains stable during long conversation sessions
## Implementation Notes
### ChromaDB Integration
- Use existing patterns from `/tests/framework-comparison/test_langgraph_chromadb.py`
- Implement HuggingFace embeddings for local processing (no API dependencies)
- Configure persistent storage in `./data/chromadb_rag/` directory
- Use collection per video or organized by user/topic as needed
### Transcript Chunking Strategy
- Create semantic chunks of 200-400 words with 50-word overlap
- Preserve sentence boundaries and paragraph structure
- Maintain timestamp ranges for each chunk
- Include video context (title, channel) in chunk metadata
### RAG Response Pattern
- Retrieve 3-5 most relevant chunks for context
- Include source timestamps in response format: "According to the video at [00:05:23], ..."
- Provide YouTube deep links for timestamp navigation
- Handle cases where no relevant content is found gracefully
### DeepSeek Integration
- Use DeepSeek API for response generation (per user requirement: no Anthropic)
- Configure appropriate model parameters for conversational responses
- Implement cost tracking and usage monitoring
- Add response quality scoring and feedback collection
## Risk Mitigation
### High Risk: Answer Quality and Relevance
- **Risk**: RAG responses may be generic or miss important context
- **Mitigation**: Quality scoring, user feedback collection, continuous prompt optimization
### Medium Risk: Timestamp Accuracy
- **Risk**: Source timestamps may not accurately reflect quoted content
- **Mitigation**: Chunk boundary validation, timestamp verification, user correction system
### Medium Risk: Performance with Large Videos
- **Risk**: Long videos may cause slow embedding generation and search
- **Mitigation**: Batch processing, progress tracking, optimized chunking strategies
---
**Story Owner**: Development Team
**Architecture Reference**: BMad Method Epic-Story Structure
**Implementation Status**: Ready for Development
**Last Updated**: 2025-08-27