462 lines
16 KiB
Markdown
462 lines
16 KiB
Markdown
# Story 4.6: RAG-Powered Video Chat with ChromaDB
|
|
|
|
## Story Overview
|
|
|
|
**Story ID**: 4.6
|
|
**Epic**: 4 - Advanced Intelligence & Developer Platform
|
|
**Title**: RAG-Powered Video Chat with ChromaDB
|
|
**Status**: 📋 READY FOR IMPLEMENTATION
|
|
**Priority**: Medium
|
|
|
|
**Goal**: Implement a RAG (Retrieval Augmented Generation) chatbot interface using ChromaDB for semantic search, enabling users to have interactive Q&A conversations with video content using precise timestamp source references.
|
|
|
|
**Value Proposition**: Transform passive video consumption into interactive content exploration, allowing users to ask specific questions about video content and receive precise answers with exact timestamp references for verification.
|
|
|
|
**Dependencies**:
|
|
- ✅ Story 4.4 (Custom AI Models) for AI service infrastructure
|
|
- ✅ Existing ChromaDB integration patterns from `/tests/framework-comparison/test_langgraph_chromadb.py`
|
|
- ✅ Transcript extraction system
|
|
|
|
**Estimated Effort**: 20 hours
|
|
|
|
## Technical Requirements
|
|
|
|
### Core Features
|
|
|
|
#### 1. ChromaDB Vector Database
|
|
- **Semantic Transcript Chunking**: Split transcripts into meaningful chunks with overlap
|
|
- **Embedding Storage**: Generate and store embeddings for all transcript segments
|
|
- **Metadata Preservation**: Maintain timestamp, video ID, and section information
|
|
- **Vector Search**: Semantic similarity search across transcript content
|
|
- **Collection Management**: Organize embeddings by video, user, or topic
|
|
|
|
#### 2. RAG Implementation
|
|
- **Context Retrieval**: Fetch relevant transcript chunks based on user questions
|
|
- **Retrieved Chunks**: Use existing test patterns from `/tests/framework-comparison/`
|
|
- **Context Window**: Optimize context size for AI model limits
|
|
- **Relevance Scoring**: Rank retrieved chunks by semantic relevance
|
|
- **Source Attribution**: Maintain clear connection between chunks and timestamps
|
|
|
|
#### 3. Chat Interface
|
|
- **Real-time Q&A**: Interactive chat interface for video-specific questions
|
|
- **Timestamp References**: Every response includes source timestamps like `[00:05:23]`
|
|
- **DeepSeek Integration**: AI responses using DeepSeek models (no Anthropic per user requirements)
|
|
- **Context Awareness**: Maintain conversation context and follow-up questions
|
|
- **Visual Design**: Clean chat interface integrated with video summary page
|
|
|
|
#### 4. Enhanced Features
|
|
- **Follow-up Suggestions**: AI-generated follow-up questions based on content
|
|
- **Conversation History**: Persistent chat sessions linked to video summaries
|
|
- **Export Conversations**: Save Q&A sessions as part of video documentation
|
|
- **Multi-Video Chat**: Ask questions across multiple videos in a playlist
|
|
|
|
### Technical Architecture
|
|
|
|
#### RAG System Components
|
|
|
|
```python
|
|
class RAGService:
|
|
def __init__(self):
|
|
self.vector_db = ChromaVectorDB()
|
|
self.embeddings = HuggingFaceEmbeddings() # Local embeddings
|
|
self.ai_service = DeepSeekService()
|
|
self.chunk_processor = TranscriptChunker()
|
|
|
|
async def process_video_for_rag(self, video_id: str, transcript: str) -> bool:
|
|
# Chunk transcript into semantic segments
|
|
# Generate embeddings for each chunk
|
|
# Store in ChromaDB with metadata
|
|
# Return success status
|
|
|
|
async def ask_question(self, video_id: str, question: str, chat_history: List[ChatMessage]) -> ChatResponse:
|
|
# Retrieve relevant chunks using semantic search
|
|
# Build context from retrieved chunks
|
|
# Generate response with DeepSeek
|
|
# Format response with timestamp references
|
|
```
|
|
|
|
#### Database Schema Extensions
|
|
|
|
```sql
|
|
-- Chat sessions for persistent conversations
|
|
CREATE TABLE chat_sessions (
|
|
id UUID PRIMARY KEY,
|
|
user_id UUID REFERENCES users(id),
|
|
video_id VARCHAR(20),
|
|
summary_id UUID REFERENCES summaries(id),
|
|
session_name VARCHAR(200),
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
total_messages INTEGER DEFAULT 0,
|
|
is_active BOOLEAN DEFAULT TRUE
|
|
);
|
|
|
|
-- Individual chat messages
|
|
CREATE TABLE chat_messages (
|
|
id UUID PRIMARY KEY,
|
|
session_id UUID REFERENCES chat_sessions(id),
|
|
message_type VARCHAR(20), -- 'user', 'assistant', 'system'
|
|
content TEXT,
|
|
sources JSONB, -- Array of {chunk_id, timestamp, relevance_score}
|
|
processing_time_seconds FLOAT,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- Vector embeddings for RAG (ChromaDB metadata reference)
|
|
CREATE TABLE video_chunks (
|
|
id UUID PRIMARY KEY,
|
|
video_id VARCHAR(20),
|
|
chunk_index INTEGER,
|
|
chunk_text TEXT,
|
|
start_timestamp INTEGER, -- seconds
|
|
end_timestamp INTEGER,
|
|
word_count INTEGER,
|
|
embedding_id VARCHAR(100), -- ChromaDB document ID
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- RAG performance tracking
|
|
CREATE TABLE rag_analytics (
|
|
id UUID PRIMARY KEY,
|
|
video_id VARCHAR(20),
|
|
question TEXT,
|
|
retrieval_count INTEGER,
|
|
relevance_scores JSONB,
|
|
response_quality_score FLOAT,
|
|
user_feedback INTEGER, -- 1-5 rating
|
|
processing_time_seconds FLOAT,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
```
|
|
|
|
## Implementation Tasks
|
|
|
|
### Task 4.6.1: ChromaDB Vector Database Setup (6 hours)
|
|
|
|
#### Subtasks:
|
|
1. **ChromaDB Configuration** (2 hours)
|
|
- Set up ChromaDB client with persistent storage
|
|
- Configure collections for video transcripts
|
|
- Implement collection naming and organization strategy
|
|
- Add cleanup and maintenance procedures
|
|
- Test database initialization and connection
|
|
|
|
2. **Transcript Chunking Service** (2 hours)
|
|
- Create intelligent transcript segmentation algorithm
|
|
- Implement overlapping chunks for context preservation
|
|
- Extract meaningful chunk boundaries (sentence/paragraph breaks)
|
|
- Preserve timestamp information in chunks
|
|
- Handle various transcript formats and quality levels
|
|
|
|
3. **Embedding Generation and Storage** (2 hours)
|
|
- Integrate HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2)
|
|
- Generate embeddings for transcript chunks
|
|
- Store embeddings with metadata in ChromaDB
|
|
- Implement batch processing for large transcripts
|
|
- Add progress tracking for embedding generation
|
|
|
|
### Task 4.6.2: RAG Retrieval System (8 hours)
|
|
|
|
#### Subtasks:
|
|
1. **Semantic Search Implementation** (3 hours)
|
|
- Implement similarity search across video chunks
|
|
- Add relevance scoring and ranking algorithms
|
|
- Configure search parameters (number of results, similarity threshold)
|
|
- Handle edge cases (no relevant chunks, low similarity scores)
|
|
- Test search quality with various question types
|
|
|
|
2. **Context Building Service** (2 hours)
|
|
- Aggregate retrieved chunks into coherent context
|
|
- Implement context window management for AI models
|
|
- Preserve chunk ordering and timestamp information
|
|
- Add context summarization for long retrievals
|
|
- Handle overlapping chunks and deduplication
|
|
|
|
3. **Source Attribution System** (2 hours)
|
|
- Link retrieved chunks to specific timestamps
|
|
- Generate clickable timestamp references `[00:05:23]`
|
|
- Create YouTube deep links for timestamp navigation
|
|
- Implement source verification and quality checks
|
|
- Add confidence scoring for source attribution
|
|
|
|
4. **RAG Response Generation** (1 hour)
|
|
- Integrate DeepSeek AI service for response generation
|
|
- Create RAG-specific prompts with context and question
|
|
- Format responses with proper source citations
|
|
- Handle cases where no relevant context is found
|
|
- Add response quality validation
|
|
|
|
### Task 4.6.3: Chat Interface Implementation (4 hours)
|
|
|
|
#### Subtasks:
|
|
1. **Chat Frontend Component** (2 hours)
|
|
- Create interactive chat interface with message history
|
|
- Implement typing indicators and loading states
|
|
- Add timestamp link rendering and click handling
|
|
- Design responsive chat layout for video summary pages
|
|
- Add keyboard shortcuts and accessibility features
|
|
|
|
2. **Chat Session Management** (1 hour)
|
|
- Implement persistent chat sessions linked to videos
|
|
- Add session creation, saving, and loading
|
|
- Create chat session list and management interface
|
|
- Handle session state and conversation context
|
|
- Add session export and sharing functionality
|
|
|
|
3. **Follow-up Question System** (1 hour)
|
|
- Generate AI-powered follow-up question suggestions
|
|
- Base suggestions on video content and conversation context
|
|
- Display suggested questions as clickable options
|
|
- Track suggestion effectiveness and user engagement
|
|
- Add customizable suggestion preferences
|
|
|
|
### Task 4.6.4: API Integration and Enhancement (2 hours)
|
|
|
|
#### Subtasks:
|
|
1. **RAG API Endpoints** (1 hour)
|
|
- `POST /api/rag/chat/{video_id}` - Ask question about specific video
|
|
- `GET /api/rag/sessions/{user_id}` - Get user's chat sessions
|
|
- `POST /api/rag/sessions/{session_id}/export` - Export conversation
|
|
- `GET /api/rag/suggestions/{video_id}` - Get follow-up suggestions
|
|
- Add comprehensive error handling and validation
|
|
|
|
2. **Performance Optimization** (1 hour)
|
|
- Implement caching for frequent questions and responses
|
|
- Add batch processing for multiple questions
|
|
- Optimize ChromaDB queries and connection management
|
|
- Add response streaming for long AI responses
|
|
- Monitor and optimize response times and resource usage
|
|
|
|
## Data Models
|
|
|
|
### RAG Chat Models
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
from typing import List, Dict, Optional, Any
|
|
from datetime import datetime
|
|
from enum import Enum
|
|
|
|
class MessageType(str, Enum):
|
|
USER = "user"
|
|
ASSISTANT = "assistant"
|
|
SYSTEM = "system"
|
|
|
|
class SourceReference(BaseModel):
|
|
chunk_id: str
|
|
timestamp: int # seconds
|
|
timestamp_formatted: str # [HH:MM:SS]
|
|
youtube_link: str
|
|
chunk_text: str
|
|
relevance_score: float
|
|
|
|
class ChatMessage(BaseModel):
|
|
id: str
|
|
message_type: MessageType
|
|
content: str
|
|
sources: List[SourceReference]
|
|
processing_time_seconds: float
|
|
created_at: datetime
|
|
|
|
class ChatSession(BaseModel):
|
|
id: str
|
|
user_id: str
|
|
video_id: str
|
|
summary_id: str
|
|
session_name: str
|
|
messages: List[ChatMessage]
|
|
total_messages: int
|
|
is_active: bool
|
|
created_at: datetime
|
|
updated_at: datetime
|
|
|
|
class ChatRequest(BaseModel):
|
|
video_id: str
|
|
question: str
|
|
session_id: Optional[str] = None
|
|
include_context: bool = True
|
|
max_sources: int = 5
|
|
|
|
class ChatResponse(BaseModel):
|
|
session_id: str
|
|
message: ChatMessage
|
|
follow_up_suggestions: List[str]
|
|
context_retrieved: bool
|
|
total_chunks_searched: int
|
|
|
|
class RAGAnalytics(BaseModel):
|
|
question: str
|
|
retrieval_count: int
|
|
relevance_scores: List[float]
|
|
response_quality_score: float
|
|
processing_time_seconds: float
|
|
user_feedback: Optional[int] = None
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- **ChromaDB Integration**: Connection, storage, and retrieval operations
|
|
- **Transcript Chunking**: Segmentation quality and metadata preservation
|
|
- **Embedding Generation**: Vector quality and consistency
|
|
- **Semantic Search**: Relevance and ranking accuracy
|
|
- **Source Attribution**: Timestamp accuracy and link generation
|
|
|
|
### Integration Tests
|
|
- **RAG Pipeline**: End-to-end question answering workflow
|
|
- **Chat API**: All chat and session management endpoints
|
|
- **Frontend Integration**: Chat interface functionality and state management
|
|
- **Database Operations**: Session and message persistence
|
|
|
|
### Quality Assurance Tests
|
|
- **Answer Relevance**: Semantic accuracy of responses to questions
|
|
- **Source Attribution**: Timestamp precision and link functionality
|
|
- **Response Quality**: Coherence and helpfulness of AI responses
|
|
- **Performance**: Response time and resource usage under load
|
|
|
|
## API Specification
|
|
|
|
### RAG Chat Endpoints
|
|
|
|
```yaml
|
|
/api/rag/chat/{video_id}:
|
|
post:
|
|
summary: Ask question about video content using RAG
|
|
parameters:
|
|
- name: video_id
|
|
in: path
|
|
required: true
|
|
schema:
|
|
type: string
|
|
requestBody:
|
|
required: true
|
|
content:
|
|
application/json:
|
|
schema:
|
|
$ref: '#/components/schemas/ChatRequest'
|
|
responses:
|
|
200:
|
|
content:
|
|
application/json:
|
|
schema:
|
|
$ref: '#/components/schemas/ChatResponse'
|
|
|
|
/api/rag/sessions/{user_id}:
|
|
get:
|
|
summary: Get user's chat sessions
|
|
parameters:
|
|
- name: user_id
|
|
in: path
|
|
required: true
|
|
schema:
|
|
type: string
|
|
- name: active_only
|
|
in: query
|
|
schema:
|
|
type: boolean
|
|
default: true
|
|
responses:
|
|
200:
|
|
content:
|
|
application/json:
|
|
schema:
|
|
type: array
|
|
items:
|
|
$ref: '#/components/schemas/ChatSession'
|
|
|
|
/api/rag/embeddings/{video_id}/generate:
|
|
post:
|
|
summary: Generate embeddings for video transcript
|
|
parameters:
|
|
- name: video_id
|
|
in: path
|
|
required: true
|
|
schema:
|
|
type: string
|
|
responses:
|
|
202:
|
|
content:
|
|
application/json:
|
|
schema:
|
|
type: object
|
|
properties:
|
|
job_id:
|
|
type: string
|
|
status:
|
|
type: string
|
|
estimated_completion:
|
|
type: string
|
|
format: date-time
|
|
```
|
|
|
|
## Success Criteria
|
|
|
|
### Functional Requirements ✅
|
|
- [ ] ChromaDB stores transcript embeddings with timestamp metadata
|
|
- [ ] Semantic search retrieves relevant content chunks for user questions
|
|
- [ ] Chat interface provides real-time Q&A with timestamp source references
|
|
- [ ] DeepSeek AI generates contextual responses using retrieved chunks
|
|
- [ ] Follow-up question suggestions based on video content
|
|
- [ ] Persistent chat sessions linked to specific videos
|
|
|
|
### Quality Requirements ✅
|
|
- [ ] Answer relevance >85% for factual questions about video content
|
|
- [ ] Timestamp references accurate within 10-second tolerance
|
|
- [ ] Source attribution clearly links responses to specific video segments
|
|
- [ ] Response quality maintains conversation context across messages
|
|
- [ ] Follow-up suggestions are relevant and engaging
|
|
- [ ] Chat interface provides smooth user experience with loading states
|
|
|
|
### Performance Requirements ✅
|
|
- [ ] Question answering response time under 8 seconds
|
|
- [ ] ChromaDB search completes in under 2 seconds
|
|
- [ ] Embedding generation processes 1-hour video in under 5 minutes
|
|
- [ ] Chat interface supports concurrent conversations without degradation
|
|
- [ ] Memory usage remains stable during long conversation sessions
|
|
|
|
## Implementation Notes
|
|
|
|
### ChromaDB Integration
|
|
- Use existing patterns from `/tests/framework-comparison/test_langgraph_chromadb.py`
|
|
- Implement HuggingFace embeddings for local processing (no API dependencies)
|
|
- Configure persistent storage in `./data/chromadb_rag/` directory
|
|
- Use collection per video or organized by user/topic as needed
|
|
|
|
### Transcript Chunking Strategy
|
|
- Create semantic chunks of 200-400 words with 50-word overlap
|
|
- Preserve sentence boundaries and paragraph structure
|
|
- Maintain timestamp ranges for each chunk
|
|
- Include video context (title, channel) in chunk metadata
|
|
|
|
### RAG Response Pattern
|
|
- Retrieve 3-5 most relevant chunks for context
|
|
- Include source timestamps in response format: "According to the video at [00:05:23], ..."
|
|
- Provide YouTube deep links for timestamp navigation
|
|
- Handle cases where no relevant content is found gracefully
|
|
|
|
### DeepSeek Integration
|
|
- Use DeepSeek API for response generation (per user requirement: no Anthropic)
|
|
- Configure appropriate model parameters for conversational responses
|
|
- Implement cost tracking and usage monitoring
|
|
- Add response quality scoring and feedback collection
|
|
|
|
## Risk Mitigation
|
|
|
|
### High Risk: Answer Quality and Relevance
|
|
- **Risk**: RAG responses may be generic or miss important context
|
|
- **Mitigation**: Quality scoring, user feedback collection, continuous prompt optimization
|
|
|
|
### Medium Risk: Timestamp Accuracy
|
|
- **Risk**: Source timestamps may not accurately reflect quoted content
|
|
- **Mitigation**: Chunk boundary validation, timestamp verification, user correction system
|
|
|
|
### Medium Risk: Performance with Large Videos
|
|
- **Risk**: Long videos may cause slow embedding generation and search
|
|
- **Mitigation**: Batch processing, progress tracking, optimized chunking strategies
|
|
|
|
---
|
|
|
|
**Story Owner**: Development Team
|
|
**Architecture Reference**: BMad Method Epic-Story Structure
|
|
**Implementation Status**: Ready for Development
|
|
**Last Updated**: 2025-08-27 |