# Story 2.4: Multi-Model Support ## Status Draft ## Story **As a** user **I want** the system to support multiple AI models (OpenAI, Anthropic, DeepSeek) with intelligent selection **so that** I can choose the best model for my content type and optimize for cost or quality preferences ## Acceptance Criteria 1. Support for multiple AI providers: OpenAI GPT-4o-mini, Anthropic Claude, DeepSeek V2 2. Intelligent model selection based on content type, length, and user preferences 3. Automatic fallback to alternative models when primary model fails or is unavailable 4. Cost comparison and optimization recommendations for different model choices 5. Model performance tracking and quality comparison across different content types 6. User preference management for model selection and fallback strategies ## Tasks / Subtasks - [ ] **Task 1: Multi-Model Service Architecture** (AC: 1, 3) - [ ] Create `AIModelRegistry` for managing multiple model providers - [ ] Implement provider-specific adapters (OpenAI, Anthropic, DeepSeek) - [ ] Create unified interface for model switching and fallback logic - [ ] Add model availability monitoring and health checks - [ ] **Task 2: Model-Specific Implementations** (AC: 1) - [ ] Implement `AnthropicSummarizer` for Claude 3.5 Sonnet integration - [ ] Create `DeepSeekSummarizer` for DeepSeek V2 integration - [ ] Standardize prompt optimization for each model's strengths - [ ] Add model-specific parameter tuning and optimization - [ ] **Task 3: Intelligent Model Selection** (AC: 2, 4) - [ ] Create content analysis for optimal model matching - [ ] Implement cost-quality optimization algorithms - [ ] Add model recommendation engine based on content characteristics - [ ] Create user preference learning system - [ ] **Task 4: Fallback and Reliability** (AC: 3) - [ ] Implement automatic failover logic with error classification - [ ] Create model health monitoring and status tracking - [ ] Add graceful degradation with quality maintenance - [ ] Implement retry logic with model rotation - [ ] **Task 5: Performance and Cost Analytics** (AC: 4, 5) - [ ] Create model performance comparison dashboard - [ ] Implement cost tracking and optimization recommendations - [ ] Add quality scoring across different models and content types - [ ] Create model usage analytics and insights - [ ] **Task 6: User Experience and Configuration** (AC: 6) - [ ] Add model selection options in frontend interface - [ ] Create user preference management for model choices - [ ] Implement model comparison tools for users - [ ] Add real-time cost estimates and recommendations - [ ] **Task 7: Integration and Testing** (AC: 1, 2, 3, 4, 5, 6) - [ ] Update SummaryPipeline to use multi-model system - [ ] Test model switching and fallback scenarios - [ ] Validate cost calculations and performance metrics - [ ] Create comprehensive model comparison testing ## Dev Notes ### Architecture Context This story transforms the single-model AI service into a sophisticated multi-model system that can intelligently choose and switch between AI providers. The system must maintain consistency while optimizing for user preferences, content requirements, and cost efficiency. ### Multi-Model Architecture Design [Source: docs/architecture.md#multi-model-ai-architecture] ```python # backend/services/ai_model_registry.py from abc import ABC, abstractmethod from enum import Enum from typing import Dict, List, Optional, Any, Union from dataclasses import dataclass import asyncio import time from ..services.ai_service import AIService, SummaryRequest, SummaryResult class ModelProvider(Enum): OPENAI = "openai" ANTHROPIC = "anthropic" DEEPSEEK = "deepseek" class ModelCapability(Enum): GENERAL_SUMMARIZATION = "general_summarization" TECHNICAL_CONTENT = "technical_content" CREATIVE_CONTENT = "creative_content" LONG_FORM_CONTENT = "long_form_content" MULTILINGUAL = "multilingual" COST_OPTIMIZED = "cost_optimized" HIGH_QUALITY = "high_quality" @dataclass class ModelSpecs: provider: ModelProvider model_name: str max_input_tokens: int max_output_tokens: int cost_per_1k_input_tokens: float cost_per_1k_output_tokens: float capabilities: List[ModelCapability] quality_score: float # 0.0 to 1.0 speed_score: float # 0.0 to 1.0 (relative) reliability_score: float # 0.0 to 1.0 @dataclass class ModelSelection: primary_model: ModelProvider fallback_models: List[ModelProvider] reasoning: str estimated_cost: float estimated_quality: float class AIModelRegistry: """Registry and orchestrator for multiple AI models""" def __init__(self): self.models: Dict[ModelProvider, AIService] = {} self.model_specs: Dict[ModelProvider, ModelSpecs] = {} self.model_health: Dict[ModelProvider, Dict[str, Any]] = {} self._initialize_model_specs() def _initialize_model_specs(self): """Initialize model specifications and capabilities""" self.model_specs[ModelProvider.OPENAI] = ModelSpecs( provider=ModelProvider.OPENAI, model_name="gpt-4o-mini", max_input_tokens=128000, max_output_tokens=16384, cost_per_1k_input_tokens=0.00015, cost_per_1k_output_tokens=0.0006, capabilities=[ ModelCapability.GENERAL_SUMMARIZATION, ModelCapability.TECHNICAL_CONTENT, ModelCapability.CREATIVE_CONTENT, ModelCapability.COST_OPTIMIZED ], quality_score=0.85, speed_score=0.90, reliability_score=0.95 ) self.model_specs[ModelProvider.ANTHROPIC] = ModelSpecs( provider=ModelProvider.ANTHROPIC, model_name="claude-3-5-haiku-20241022", max_input_tokens=200000, max_output_tokens=8192, cost_per_1k_input_tokens=0.001, cost_per_1k_output_tokens=0.005, capabilities=[ ModelCapability.GENERAL_SUMMARIZATION, ModelCapability.TECHNICAL_CONTENT, ModelCapability.LONG_FORM_CONTENT, ModelCapability.HIGH_QUALITY ], quality_score=0.95, speed_score=0.80, reliability_score=0.92 ) self.model_specs[ModelProvider.DEEPSEEK] = ModelSpecs( provider=ModelProvider.DEEPSEEK, model_name="deepseek-chat", max_input_tokens=64000, max_output_tokens=4096, cost_per_1k_input_tokens=0.00014, cost_per_1k_output_tokens=0.00028, capabilities=[ ModelCapability.GENERAL_SUMMARIZATION, ModelCapability.TECHNICAL_CONTENT, ModelCapability.COST_OPTIMIZED ], quality_score=0.80, speed_score=0.85, reliability_score=0.88 ) def register_model(self, provider: ModelProvider, model_service: AIService): """Register a model service with the registry""" self.models[provider] = model_service self.model_health[provider] = { "status": "healthy", "last_check": time.time(), "error_count": 0, "success_rate": 1.0 } async def select_optimal_model( self, request: SummaryRequest, user_preferences: Optional[Dict[str, Any]] = None ) -> ModelSelection: """Select optimal model based on content and preferences""" # Analyze content characteristics content_analysis = await self._analyze_content_for_model_selection(request) # Get user preferences preferences = user_preferences or {} priority = preferences.get("priority", "balanced") # cost, quality, speed, balanced # Score models based on requirements model_scores = {} for provider, specs in self.model_specs.items(): if provider not in self.models: continue # Skip unavailable models score = self._calculate_model_score(specs, content_analysis, priority) model_scores[provider] = score # Sort by score and filter healthy models healthy_models = [ provider for provider, health in self.model_health.items() if health["status"] == "healthy" and provider in model_scores ] if not healthy_models: raise Exception("No healthy AI models available") # Select primary and fallback models sorted_models = sorted(healthy_models, key=lambda p: model_scores[p], reverse=True) primary_model = sorted_models[0] fallback_models = sorted_models[1:3] # Top 2 fallbacks # Calculate estimates primary_specs = self.model_specs[primary_model] estimated_cost = self._estimate_cost(request, primary_specs) estimated_quality = primary_specs.quality_score # Generate reasoning reasoning = self._generate_selection_reasoning( primary_model, content_analysis, priority, model_scores[primary_model] ) return ModelSelection( primary_model=primary_model, fallback_models=fallback_models, reasoning=reasoning, estimated_cost=estimated_cost, estimated_quality=estimated_quality ) async def generate_summary_with_fallback( self, request: SummaryRequest, model_selection: ModelSelection ) -> SummaryResult: """Generate summary with automatic fallback""" models_to_try = [model_selection.primary_model] + model_selection.fallback_models last_error = None for model_provider in models_to_try: try: model_service = self.models[model_provider] # Update health monitoring start_time = time.time() result = await model_service.generate_summary(request) # Record success await self._record_model_success(model_provider, time.time() - start_time) # Add model info to result result.processing_metadata["model_provider"] = model_provider.value result.processing_metadata["model_name"] = self.model_specs[model_provider].model_name result.processing_metadata["fallback_used"] = model_provider != model_selection.primary_model return result except Exception as e: last_error = e await self._record_model_error(model_provider, str(e)) # If this was the last model to try, raise the error if model_provider == models_to_try[-1]: raise Exception(f"All AI models failed. Last error: {str(e)}") # Continue to next model continue raise Exception("No AI models available for processing") async def _analyze_content_for_model_selection(self, request: SummaryRequest) -> Dict[str, Any]: """Analyze content to determine optimal model characteristics""" transcript = request.transcript analysis = { "length": len(transcript), "word_count": len(transcript.split()), "token_estimate": len(transcript) // 4, # Rough estimate "complexity": "medium", "content_type": "general", "technical_density": 0.0, "required_capabilities": [ModelCapability.GENERAL_SUMMARIZATION] } # Analyze content type and complexity lower_transcript = transcript.lower() # Technical content detection technical_indicators = [ "algorithm", "function", "variable", "database", "api", "code", "programming", "software", "technical", "implementation", "architecture" ] technical_count = sum(1 for word in technical_indicators if word in lower_transcript) if technical_count >= 5: analysis["content_type"] = "technical" analysis["technical_density"] = min(1.0, technical_count / 20) analysis["required_capabilities"].append(ModelCapability.TECHNICAL_CONTENT) # Long-form content detection if analysis["word_count"] > 5000: analysis["required_capabilities"].append(ModelCapability.LONG_FORM_CONTENT) # Creative content detection creative_indicators = ["story", "creative", "art", "design", "narrative", "experience"] if sum(1 for word in creative_indicators if word in lower_transcript) >= 3: analysis["content_type"] = "creative" analysis["required_capabilities"].append(ModelCapability.CREATIVE_CONTENT) # Complexity assessment avg_sentence_length = analysis["word_count"] / len(transcript.split('.')) if avg_sentence_length > 25: analysis["complexity"] = "high" elif avg_sentence_length < 15: analysis["complexity"] = "low" return analysis def _calculate_model_score( self, specs: ModelSpecs, content_analysis: Dict[str, Any], priority: str ) -> float: """Calculate score for model based on requirements and preferences""" score = 0.0 # Base capability matching required_capabilities = content_analysis["required_capabilities"] capability_match = len([cap for cap in required_capabilities if cap in specs.capabilities]) capability_score = capability_match / len(required_capabilities) if required_capabilities else 1.0 # Token limit checking token_estimate = content_analysis["token_estimate"] if token_estimate > specs.max_input_tokens: return 0.0 # Cannot handle this content # Priority-based scoring if priority == "cost": cost_score = 1.0 - (specs.cost_per_1k_input_tokens / 0.002) # Normalize against max expected cost score = 0.4 * capability_score + 0.5 * cost_score + 0.1 * specs.reliability_score elif priority == "quality": score = 0.3 * capability_score + 0.6 * specs.quality_score + 0.1 * specs.reliability_score elif priority == "speed": score = 0.3 * capability_score + 0.5 * specs.speed_score + 0.2 * specs.reliability_score else: # balanced score = (0.3 * capability_score + 0.25 * specs.quality_score + 0.2 * specs.speed_score + 0.15 * specs.reliability_score + 0.1 * (1.0 - specs.cost_per_1k_input_tokens / 0.002)) # Bonus for specific content type alignment if content_analysis["content_type"] == "technical" and ModelCapability.TECHNICAL_CONTENT in specs.capabilities: score += 0.1 return min(1.0, max(0.0, score)) def _estimate_cost(self, request: SummaryRequest, specs: ModelSpecs) -> float: """Estimate cost for processing with specific model""" input_tokens = len(request.transcript) // 4 # Rough estimate output_tokens = 500 # Average summary length input_cost = (input_tokens / 1000) * specs.cost_per_1k_input_tokens output_cost = (output_tokens / 1000) * specs.cost_per_1k_output_tokens return input_cost + output_cost def _generate_selection_reasoning( self, selected_model: ModelProvider, content_analysis: Dict[str, Any], priority: str, score: float ) -> str: """Generate human-readable reasoning for model selection""" specs = self.model_specs[selected_model] reasons = [f"Selected {specs.model_name} (score: {score:.2f})"] if priority == "cost": reasons.append(f"Cost-optimized choice at ${specs.cost_per_1k_input_tokens:.5f} per 1K tokens") elif priority == "quality": reasons.append(f"High quality option (quality score: {specs.quality_score:.2f})") elif priority == "speed": reasons.append(f"Fast processing (speed score: {specs.speed_score:.2f})") if content_analysis["content_type"] == "technical": reasons.append("Optimized for technical content") if content_analysis["word_count"] > 3000: reasons.append("Suitable for long-form content") return ". ".join(reasons) async def _record_model_success(self, provider: ModelProvider, processing_time: float): """Record successful model usage""" health = self.model_health[provider] health["status"] = "healthy" health["last_check"] = time.time() health["success_rate"] = min(1.0, health["success_rate"] + 0.01) health["avg_processing_time"] = processing_time async def _record_model_error(self, provider: ModelProvider, error: str): """Record model error for health monitoring""" health = self.model_health[provider] health["error_count"] += 1 health["last_error"] = error health["last_check"] = time.time() health["success_rate"] = max(0.0, health["success_rate"] - 0.05) # Mark as unhealthy if too many errors if health["error_count"] > 5 and health["success_rate"] < 0.3: health["status"] = "unhealthy" async def get_model_comparison(self, request: SummaryRequest) -> Dict[str, Any]: """Get comparison of all available models for the request""" content_analysis = await self._analyze_content_for_model_selection(request) comparisons = {} for provider, specs in self.model_specs.items(): if provider not in self.models: continue comparisons[provider.value] = { "model_name": specs.model_name, "estimated_cost": self._estimate_cost(request, specs), "quality_score": specs.quality_score, "speed_score": specs.speed_score, "capabilities": [cap.value for cap in specs.capabilities], "health_status": self.model_health[provider]["status"], "suitability_scores": { "cost_optimized": self._calculate_model_score(specs, content_analysis, "cost"), "quality_focused": self._calculate_model_score(specs, content_analysis, "quality"), "speed_focused": self._calculate_model_score(specs, content_analysis, "speed"), "balanced": self._calculate_model_score(specs, content_analysis, "balanced") } } return { "content_analysis": content_analysis, "model_comparisons": comparisons, "recommendation": await self.select_optimal_model(request) } def get_health_status(self) -> Dict[str, Any]: """Get health status of all registered models""" return { "models": { provider.value: { "status": health["status"], "success_rate": health["success_rate"], "error_count": health["error_count"], "last_check": health["last_check"], "model_name": self.model_specs[provider].model_name } for provider, health in self.model_health.items() }, "total_healthy": sum(1 for h in self.model_health.values() if h["status"] == "healthy"), "total_models": len(self.model_health) } ``` ### Model-Specific Implementations [Source: docs/architecture.md#model-adapters] ```python # backend/services/anthropic_summarizer.py import anthropic from .ai_service import AIService, SummaryRequest, SummaryResult, SummaryLength class AnthropicSummarizer(AIService): def __init__(self, api_key: str, model: str = "claude-3-5-haiku-20241022"): self.client = anthropic.AsyncAnthropic(api_key=api_key) self.model = model # Cost per 1K tokens (as of 2025) self.input_cost_per_1k = 0.001 # $1.00 per 1M input tokens self.output_cost_per_1k = 0.005 # $5.00 per 1M output tokens async def generate_summary(self, request: SummaryRequest) -> SummaryResult: """Generate summary using Anthropic Claude""" prompt = self._build_anthropic_prompt(request) try: start_time = time.time() message = await self.client.messages.create( model=self.model, max_tokens=self._get_max_tokens(request.length), temperature=0.3, messages=[ {"role": "user", "content": prompt} ] ) processing_time = time.time() - start_time # Parse response (Anthropic returns structured text) result_data = self._parse_anthropic_response(message.content[0].text) # Calculate costs input_tokens = message.usage.input_tokens output_tokens = message.usage.output_tokens input_cost = (input_tokens / 1000) * self.input_cost_per_1k output_cost = (output_tokens / 1000) * self.output_cost_per_1k return SummaryResult( summary=result_data["summary"], key_points=result_data["key_points"], main_themes=result_data["main_themes"], actionable_insights=result_data["actionable_insights"], confidence_score=result_data["confidence_score"], processing_metadata={ "model": self.model, "processing_time_seconds": processing_time, "input_tokens": input_tokens, "output_tokens": output_tokens, "provider": "anthropic" }, cost_data={ "input_cost_usd": input_cost, "output_cost_usd": output_cost, "total_cost_usd": input_cost + output_cost } ) except Exception as e: raise AIServiceError(f"Anthropic summarization failed: {str(e)}") def _build_anthropic_prompt(self, request: SummaryRequest) -> str: """Build prompt optimized for Claude's instruction-following""" length_words = { SummaryLength.BRIEF: "100-200 words", SummaryLength.STANDARD: "300-500 words", SummaryLength.DETAILED: "500-800 words" } return f"""Please analyze this YouTube video transcript and provide a comprehensive summary. Summary Requirements: - Length: {length_words[request.length]} - Focus areas: {', '.join(request.focus_areas) if request.focus_areas else 'general content'} - Language: {request.language} Please structure your response as follows: ## Summary [Main summary text here - {length_words[request.length]}] ## Key Points - [Point 1] - [Point 2] - [Point 3-7 as appropriate] ## Main Themes - [Theme 1] - [Theme 2] - [Theme 3-4 as appropriate] ## Actionable Insights - [Insight 1] - [Insight 2] - [Insight 3-5 as appropriate] ## Confidence Score [Rate your confidence in this summary from 0.0 to 1.0] Transcript: {request.transcript}""" # backend/services/deepseek_summarizer.py import httpx from .ai_service import AIService, SummaryRequest, SummaryResult, SummaryLength class DeepSeekSummarizer(AIService): def __init__(self, api_key: str, model: str = "deepseek-chat"): self.api_key = api_key self.model = model self.base_url = "https://api.deepseek.com/v1" # Cost per 1K tokens (DeepSeek pricing) self.input_cost_per_1k = 0.00014 # $0.14 per 1M input tokens self.output_cost_per_1k = 0.00028 # $0.28 per 1M output tokens async def generate_summary(self, request: SummaryRequest) -> SummaryResult: """Generate summary using DeepSeek API""" prompt = self._build_deepseek_prompt(request) async with httpx.AsyncClient() as client: try: start_time = time.time() response = await client.post( f"{self.base_url}/chat/completions", headers={ "Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json" }, json={ "model": self.model, "messages": [ {"role": "system", "content": "You are an expert content summarizer."}, {"role": "user", "content": prompt} ], "temperature": 0.3, "max_tokens": self._get_max_tokens(request.length), "response_format": {"type": "json_object"} }, timeout=60.0 ) response.raise_for_status() data = response.json() processing_time = time.time() - start_time usage = data["usage"] # Parse JSON response result_data = json.loads(data["choices"][0]["message"]["content"]) # Calculate costs input_cost = (usage["prompt_tokens"] / 1000) * self.input_cost_per_1k output_cost = (usage["completion_tokens"] / 1000) * self.output_cost_per_1k return SummaryResult( summary=result_data.get("summary", ""), key_points=result_data.get("key_points", []), main_themes=result_data.get("main_themes", []), actionable_insights=result_data.get("actionable_insights", []), confidence_score=result_data.get("confidence_score", 0.8), processing_metadata={ "model": self.model, "processing_time_seconds": processing_time, "prompt_tokens": usage["prompt_tokens"], "completion_tokens": usage["completion_tokens"], "provider": "deepseek" }, cost_data={ "input_cost_usd": input_cost, "output_cost_usd": output_cost, "total_cost_usd": input_cost + output_cost } ) except Exception as e: raise AIServiceError(f"DeepSeek summarization failed: {str(e)}") ``` ### Frontend Model Selection Interface [Source: docs/architecture.md#frontend-integration] ```typescript // frontend/src/components/forms/ModelSelector.tsx import { useState } from 'react'; import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card'; import { Button } from '@/components/ui/button'; import { Badge } from '@/components/ui/badge'; import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select'; import { Tabs, TabsContent, TabsList, TabsTrigger } from '@/components/ui/tabs'; interface ModelComparison { model_name: string; estimated_cost: number; quality_score: number; speed_score: number; capabilities: string[]; health_status: string; suitability_scores: { cost_optimized: number; quality_focused: number; speed_focused: number; balanced: number; }; } interface ModelSelectorProps { comparisons: Record; selectedModel?: string; onModelSelect: (model: string, priority: string) => void; } export function ModelSelector({ comparisons, selectedModel, onModelSelect }: ModelSelectorProps) { const [priority, setPriority] = useState('balanced'); const [showComparison, setShowComparison] = useState(false); const getBestModelForPriority = (priority: string) => { const scores = Object.entries(comparisons).map(([provider, data]) => ({ provider, score: data.suitability_scores[priority as keyof typeof data.suitability_scores] })); return scores.sort((a, b) => b.score - a.score)[0]?.provider; }; const formatCost = (cost: number) => `$${cost.toFixed(4)}`; const getQualityBadgeColor = (score: number) => { if (score >= 0.9) return 'bg-green-100 text-green-800'; if (score >= 0.8) return 'bg-blue-100 text-blue-800'; return 'bg-yellow-100 text-yellow-800'; }; return ( AI Model Selection
{showComparison && ( Overview Detailed Comparison
{Object.entries(comparisons).map(([provider, data]) => ( onModelSelect(provider, priority)} > {data.model_name} {data.health_status}
Cost: {formatCost(data.estimated_cost)}
Quality: {(data.quality_score * 100).toFixed(0)}%
Speed: {(data.speed_score * 100).toFixed(0)}%
Suitability: {(data.suitability_scores[priority as keyof typeof data.suitability_scores] * 100).toFixed(0)}%
))}
{Object.entries(comparisons).map(([provider, data]) => ( ))}
Model Cost Quality Speed Capabilities Status
{data.model_name} {formatCost(data.estimated_cost)} {(data.quality_score * 100).toFixed(0)}% {(data.speed_score * 100).toFixed(0)}%
{data.capabilities.slice(0, 3).map(cap => ( {cap.replace('_', ' ')} ))} {data.capabilities.length > 3 && ( +{data.capabilities.length - 3} more )}
{data.health_status}
)}
); } ``` ### Performance Benefits - **Intelligent Model Selection**: Automatically chooses optimal model based on content and preferences - **Cost Optimization**: Up to 50% cost savings by selecting appropriate model for content type - **Quality Assurance**: Fallback mechanisms ensure consistent quality even during model outages - **Flexibility**: Users can prioritize cost, quality, or speed based on their needs - **Reliability**: Multi-model redundancy provides 99.9% uptime for summarization service ## Change Log | Date | Version | Description | Author | |------|---------|-------------|--------| | 2025-01-25 | 1.0 | Initial story creation | Bob (Scrum Master) | ## Dev Agent Record *This section will be populated by the development agent during implementation* ## QA Results *Results from QA Agent review of the completed story implementation will be added here*