917 lines
37 KiB
Markdown
917 lines
37 KiB
Markdown
# Story 2.4: Multi-Model Support
|
|
|
|
## Status
|
|
Draft
|
|
|
|
## Story
|
|
|
|
**As a** user
|
|
**I want** the system to support multiple AI models (OpenAI, Anthropic, DeepSeek) with intelligent selection
|
|
**so that** I can choose the best model for my content type and optimize for cost or quality preferences
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. Support for multiple AI providers: OpenAI GPT-4o-mini, Anthropic Claude, DeepSeek V2
|
|
2. Intelligent model selection based on content type, length, and user preferences
|
|
3. Automatic fallback to alternative models when primary model fails or is unavailable
|
|
4. Cost comparison and optimization recommendations for different model choices
|
|
5. Model performance tracking and quality comparison across different content types
|
|
6. User preference management for model selection and fallback strategies
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [ ] **Task 1: Multi-Model Service Architecture** (AC: 1, 3)
|
|
- [ ] Create `AIModelRegistry` for managing multiple model providers
|
|
- [ ] Implement provider-specific adapters (OpenAI, Anthropic, DeepSeek)
|
|
- [ ] Create unified interface for model switching and fallback logic
|
|
- [ ] Add model availability monitoring and health checks
|
|
|
|
- [ ] **Task 2: Model-Specific Implementations** (AC: 1)
|
|
- [ ] Implement `AnthropicSummarizer` for Claude 3.5 Sonnet integration
|
|
- [ ] Create `DeepSeekSummarizer` for DeepSeek V2 integration
|
|
- [ ] Standardize prompt optimization for each model's strengths
|
|
- [ ] Add model-specific parameter tuning and optimization
|
|
|
|
- [ ] **Task 3: Intelligent Model Selection** (AC: 2, 4)
|
|
- [ ] Create content analysis for optimal model matching
|
|
- [ ] Implement cost-quality optimization algorithms
|
|
- [ ] Add model recommendation engine based on content characteristics
|
|
- [ ] Create user preference learning system
|
|
|
|
- [ ] **Task 4: Fallback and Reliability** (AC: 3)
|
|
- [ ] Implement automatic failover logic with error classification
|
|
- [ ] Create model health monitoring and status tracking
|
|
- [ ] Add graceful degradation with quality maintenance
|
|
- [ ] Implement retry logic with model rotation
|
|
|
|
- [ ] **Task 5: Performance and Cost Analytics** (AC: 4, 5)
|
|
- [ ] Create model performance comparison dashboard
|
|
- [ ] Implement cost tracking and optimization recommendations
|
|
- [ ] Add quality scoring across different models and content types
|
|
- [ ] Create model usage analytics and insights
|
|
|
|
- [ ] **Task 6: User Experience and Configuration** (AC: 6)
|
|
- [ ] Add model selection options in frontend interface
|
|
- [ ] Create user preference management for model choices
|
|
- [ ] Implement model comparison tools for users
|
|
- [ ] Add real-time cost estimates and recommendations
|
|
|
|
- [ ] **Task 7: Integration and Testing** (AC: 1, 2, 3, 4, 5, 6)
|
|
- [ ] Update SummaryPipeline to use multi-model system
|
|
- [ ] Test model switching and fallback scenarios
|
|
- [ ] Validate cost calculations and performance metrics
|
|
- [ ] Create comprehensive model comparison testing
|
|
|
|
## Dev Notes
|
|
|
|
### Architecture Context
|
|
This story transforms the single-model AI service into a sophisticated multi-model system that can intelligently choose and switch between AI providers. The system must maintain consistency while optimizing for user preferences, content requirements, and cost efficiency.
|
|
|
|
### Multi-Model Architecture Design
|
|
[Source: docs/architecture.md#multi-model-ai-architecture]
|
|
|
|
```python
|
|
# backend/services/ai_model_registry.py
|
|
from abc import ABC, abstractmethod
|
|
from enum import Enum
|
|
from typing import Dict, List, Optional, Any, Union
|
|
from dataclasses import dataclass
|
|
import asyncio
|
|
import time
|
|
from ..services.ai_service import AIService, SummaryRequest, SummaryResult
|
|
|
|
class ModelProvider(Enum):
|
|
OPENAI = "openai"
|
|
ANTHROPIC = "anthropic"
|
|
DEEPSEEK = "deepseek"
|
|
|
|
class ModelCapability(Enum):
|
|
GENERAL_SUMMARIZATION = "general_summarization"
|
|
TECHNICAL_CONTENT = "technical_content"
|
|
CREATIVE_CONTENT = "creative_content"
|
|
LONG_FORM_CONTENT = "long_form_content"
|
|
MULTILINGUAL = "multilingual"
|
|
COST_OPTIMIZED = "cost_optimized"
|
|
HIGH_QUALITY = "high_quality"
|
|
|
|
@dataclass
|
|
class ModelSpecs:
|
|
provider: ModelProvider
|
|
model_name: str
|
|
max_input_tokens: int
|
|
max_output_tokens: int
|
|
cost_per_1k_input_tokens: float
|
|
cost_per_1k_output_tokens: float
|
|
capabilities: List[ModelCapability]
|
|
quality_score: float # 0.0 to 1.0
|
|
speed_score: float # 0.0 to 1.0 (relative)
|
|
reliability_score: float # 0.0 to 1.0
|
|
|
|
@dataclass
|
|
class ModelSelection:
|
|
primary_model: ModelProvider
|
|
fallback_models: List[ModelProvider]
|
|
reasoning: str
|
|
estimated_cost: float
|
|
estimated_quality: float
|
|
|
|
class AIModelRegistry:
|
|
"""Registry and orchestrator for multiple AI models"""
|
|
|
|
def __init__(self):
|
|
self.models: Dict[ModelProvider, AIService] = {}
|
|
self.model_specs: Dict[ModelProvider, ModelSpecs] = {}
|
|
self.model_health: Dict[ModelProvider, Dict[str, Any]] = {}
|
|
|
|
self._initialize_model_specs()
|
|
|
|
def _initialize_model_specs(self):
|
|
"""Initialize model specifications and capabilities"""
|
|
|
|
self.model_specs[ModelProvider.OPENAI] = ModelSpecs(
|
|
provider=ModelProvider.OPENAI,
|
|
model_name="gpt-4o-mini",
|
|
max_input_tokens=128000,
|
|
max_output_tokens=16384,
|
|
cost_per_1k_input_tokens=0.00015,
|
|
cost_per_1k_output_tokens=0.0006,
|
|
capabilities=[
|
|
ModelCapability.GENERAL_SUMMARIZATION,
|
|
ModelCapability.TECHNICAL_CONTENT,
|
|
ModelCapability.CREATIVE_CONTENT,
|
|
ModelCapability.COST_OPTIMIZED
|
|
],
|
|
quality_score=0.85,
|
|
speed_score=0.90,
|
|
reliability_score=0.95
|
|
)
|
|
|
|
self.model_specs[ModelProvider.ANTHROPIC] = ModelSpecs(
|
|
provider=ModelProvider.ANTHROPIC,
|
|
model_name="claude-3-5-haiku-20241022",
|
|
max_input_tokens=200000,
|
|
max_output_tokens=8192,
|
|
cost_per_1k_input_tokens=0.001,
|
|
cost_per_1k_output_tokens=0.005,
|
|
capabilities=[
|
|
ModelCapability.GENERAL_SUMMARIZATION,
|
|
ModelCapability.TECHNICAL_CONTENT,
|
|
ModelCapability.LONG_FORM_CONTENT,
|
|
ModelCapability.HIGH_QUALITY
|
|
],
|
|
quality_score=0.95,
|
|
speed_score=0.80,
|
|
reliability_score=0.92
|
|
)
|
|
|
|
self.model_specs[ModelProvider.DEEPSEEK] = ModelSpecs(
|
|
provider=ModelProvider.DEEPSEEK,
|
|
model_name="deepseek-chat",
|
|
max_input_tokens=64000,
|
|
max_output_tokens=4096,
|
|
cost_per_1k_input_tokens=0.00014,
|
|
cost_per_1k_output_tokens=0.00028,
|
|
capabilities=[
|
|
ModelCapability.GENERAL_SUMMARIZATION,
|
|
ModelCapability.TECHNICAL_CONTENT,
|
|
ModelCapability.COST_OPTIMIZED
|
|
],
|
|
quality_score=0.80,
|
|
speed_score=0.85,
|
|
reliability_score=0.88
|
|
)
|
|
|
|
def register_model(self, provider: ModelProvider, model_service: AIService):
|
|
"""Register a model service with the registry"""
|
|
self.models[provider] = model_service
|
|
self.model_health[provider] = {
|
|
"status": "healthy",
|
|
"last_check": time.time(),
|
|
"error_count": 0,
|
|
"success_rate": 1.0
|
|
}
|
|
|
|
async def select_optimal_model(
|
|
self,
|
|
request: SummaryRequest,
|
|
user_preferences: Optional[Dict[str, Any]] = None
|
|
) -> ModelSelection:
|
|
"""Select optimal model based on content and preferences"""
|
|
|
|
# Analyze content characteristics
|
|
content_analysis = await self._analyze_content_for_model_selection(request)
|
|
|
|
# Get user preferences
|
|
preferences = user_preferences or {}
|
|
priority = preferences.get("priority", "balanced") # cost, quality, speed, balanced
|
|
|
|
# Score models based on requirements
|
|
model_scores = {}
|
|
for provider, specs in self.model_specs.items():
|
|
if provider not in self.models:
|
|
continue # Skip unavailable models
|
|
|
|
score = self._calculate_model_score(specs, content_analysis, priority)
|
|
model_scores[provider] = score
|
|
|
|
# Sort by score and filter healthy models
|
|
healthy_models = [
|
|
provider for provider, health in self.model_health.items()
|
|
if health["status"] == "healthy" and provider in model_scores
|
|
]
|
|
|
|
if not healthy_models:
|
|
raise Exception("No healthy AI models available")
|
|
|
|
# Select primary and fallback models
|
|
sorted_models = sorted(healthy_models, key=lambda p: model_scores[p], reverse=True)
|
|
primary_model = sorted_models[0]
|
|
fallback_models = sorted_models[1:3] # Top 2 fallbacks
|
|
|
|
# Calculate estimates
|
|
primary_specs = self.model_specs[primary_model]
|
|
estimated_cost = self._estimate_cost(request, primary_specs)
|
|
estimated_quality = primary_specs.quality_score
|
|
|
|
# Generate reasoning
|
|
reasoning = self._generate_selection_reasoning(
|
|
primary_model, content_analysis, priority, model_scores[primary_model]
|
|
)
|
|
|
|
return ModelSelection(
|
|
primary_model=primary_model,
|
|
fallback_models=fallback_models,
|
|
reasoning=reasoning,
|
|
estimated_cost=estimated_cost,
|
|
estimated_quality=estimated_quality
|
|
)
|
|
|
|
async def generate_summary_with_fallback(
|
|
self,
|
|
request: SummaryRequest,
|
|
model_selection: ModelSelection
|
|
) -> SummaryResult:
|
|
"""Generate summary with automatic fallback"""
|
|
|
|
models_to_try = [model_selection.primary_model] + model_selection.fallback_models
|
|
last_error = None
|
|
|
|
for model_provider in models_to_try:
|
|
try:
|
|
model_service = self.models[model_provider]
|
|
|
|
# Update health monitoring
|
|
start_time = time.time()
|
|
|
|
result = await model_service.generate_summary(request)
|
|
|
|
# Record success
|
|
await self._record_model_success(model_provider, time.time() - start_time)
|
|
|
|
# Add model info to result
|
|
result.processing_metadata["model_provider"] = model_provider.value
|
|
result.processing_metadata["model_name"] = self.model_specs[model_provider].model_name
|
|
result.processing_metadata["fallback_used"] = model_provider != model_selection.primary_model
|
|
|
|
return result
|
|
|
|
except Exception as e:
|
|
last_error = e
|
|
await self._record_model_error(model_provider, str(e))
|
|
|
|
# If this was the last model to try, raise the error
|
|
if model_provider == models_to_try[-1]:
|
|
raise Exception(f"All AI models failed. Last error: {str(e)}")
|
|
|
|
# Continue to next model
|
|
continue
|
|
|
|
raise Exception("No AI models available for processing")
|
|
|
|
async def _analyze_content_for_model_selection(self, request: SummaryRequest) -> Dict[str, Any]:
|
|
"""Analyze content to determine optimal model characteristics"""
|
|
|
|
transcript = request.transcript
|
|
analysis = {
|
|
"length": len(transcript),
|
|
"word_count": len(transcript.split()),
|
|
"token_estimate": len(transcript) // 4, # Rough estimate
|
|
"complexity": "medium",
|
|
"content_type": "general",
|
|
"technical_density": 0.0,
|
|
"required_capabilities": [ModelCapability.GENERAL_SUMMARIZATION]
|
|
}
|
|
|
|
# Analyze content type and complexity
|
|
lower_transcript = transcript.lower()
|
|
|
|
# Technical content detection
|
|
technical_indicators = [
|
|
"algorithm", "function", "variable", "database", "api", "code",
|
|
"programming", "software", "technical", "implementation", "architecture"
|
|
]
|
|
technical_count = sum(1 for word in technical_indicators if word in lower_transcript)
|
|
|
|
if technical_count >= 5:
|
|
analysis["content_type"] = "technical"
|
|
analysis["technical_density"] = min(1.0, technical_count / 20)
|
|
analysis["required_capabilities"].append(ModelCapability.TECHNICAL_CONTENT)
|
|
|
|
# Long-form content detection
|
|
if analysis["word_count"] > 5000:
|
|
analysis["required_capabilities"].append(ModelCapability.LONG_FORM_CONTENT)
|
|
|
|
# Creative content detection
|
|
creative_indicators = ["story", "creative", "art", "design", "narrative", "experience"]
|
|
if sum(1 for word in creative_indicators if word in lower_transcript) >= 3:
|
|
analysis["content_type"] = "creative"
|
|
analysis["required_capabilities"].append(ModelCapability.CREATIVE_CONTENT)
|
|
|
|
# Complexity assessment
|
|
avg_sentence_length = analysis["word_count"] / len(transcript.split('.'))
|
|
if avg_sentence_length > 25:
|
|
analysis["complexity"] = "high"
|
|
elif avg_sentence_length < 15:
|
|
analysis["complexity"] = "low"
|
|
|
|
return analysis
|
|
|
|
def _calculate_model_score(
|
|
self,
|
|
specs: ModelSpecs,
|
|
content_analysis: Dict[str, Any],
|
|
priority: str
|
|
) -> float:
|
|
"""Calculate score for model based on requirements and preferences"""
|
|
|
|
score = 0.0
|
|
|
|
# Base capability matching
|
|
required_capabilities = content_analysis["required_capabilities"]
|
|
capability_match = len([cap for cap in required_capabilities if cap in specs.capabilities])
|
|
capability_score = capability_match / len(required_capabilities) if required_capabilities else 1.0
|
|
|
|
# Token limit checking
|
|
token_estimate = content_analysis["token_estimate"]
|
|
if token_estimate > specs.max_input_tokens:
|
|
return 0.0 # Cannot handle this content
|
|
|
|
# Priority-based scoring
|
|
if priority == "cost":
|
|
cost_score = 1.0 - (specs.cost_per_1k_input_tokens / 0.002) # Normalize against max expected cost
|
|
score = 0.4 * capability_score + 0.5 * cost_score + 0.1 * specs.reliability_score
|
|
|
|
elif priority == "quality":
|
|
score = 0.3 * capability_score + 0.6 * specs.quality_score + 0.1 * specs.reliability_score
|
|
|
|
elif priority == "speed":
|
|
score = 0.3 * capability_score + 0.5 * specs.speed_score + 0.2 * specs.reliability_score
|
|
|
|
else: # balanced
|
|
score = (0.3 * capability_score + 0.25 * specs.quality_score +
|
|
0.2 * specs.speed_score + 0.15 * specs.reliability_score +
|
|
0.1 * (1.0 - specs.cost_per_1k_input_tokens / 0.002))
|
|
|
|
# Bonus for specific content type alignment
|
|
if content_analysis["content_type"] == "technical" and ModelCapability.TECHNICAL_CONTENT in specs.capabilities:
|
|
score += 0.1
|
|
|
|
return min(1.0, max(0.0, score))
|
|
|
|
def _estimate_cost(self, request: SummaryRequest, specs: ModelSpecs) -> float:
|
|
"""Estimate cost for processing with specific model"""
|
|
|
|
input_tokens = len(request.transcript) // 4 # Rough estimate
|
|
output_tokens = 500 # Average summary length
|
|
|
|
input_cost = (input_tokens / 1000) * specs.cost_per_1k_input_tokens
|
|
output_cost = (output_tokens / 1000) * specs.cost_per_1k_output_tokens
|
|
|
|
return input_cost + output_cost
|
|
|
|
def _generate_selection_reasoning(
|
|
self,
|
|
selected_model: ModelProvider,
|
|
content_analysis: Dict[str, Any],
|
|
priority: str,
|
|
score: float
|
|
) -> str:
|
|
"""Generate human-readable reasoning for model selection"""
|
|
|
|
specs = self.model_specs[selected_model]
|
|
|
|
reasons = [f"Selected {specs.model_name} (score: {score:.2f})"]
|
|
|
|
if priority == "cost":
|
|
reasons.append(f"Cost-optimized choice at ${specs.cost_per_1k_input_tokens:.5f} per 1K tokens")
|
|
elif priority == "quality":
|
|
reasons.append(f"High quality option (quality score: {specs.quality_score:.2f})")
|
|
elif priority == "speed":
|
|
reasons.append(f"Fast processing (speed score: {specs.speed_score:.2f})")
|
|
|
|
if content_analysis["content_type"] == "technical":
|
|
reasons.append("Optimized for technical content")
|
|
|
|
if content_analysis["word_count"] > 3000:
|
|
reasons.append("Suitable for long-form content")
|
|
|
|
return ". ".join(reasons)
|
|
|
|
async def _record_model_success(self, provider: ModelProvider, processing_time: float):
|
|
"""Record successful model usage"""
|
|
|
|
health = self.model_health[provider]
|
|
health["status"] = "healthy"
|
|
health["last_check"] = time.time()
|
|
health["success_rate"] = min(1.0, health["success_rate"] + 0.01)
|
|
health["avg_processing_time"] = processing_time
|
|
|
|
async def _record_model_error(self, provider: ModelProvider, error: str):
|
|
"""Record model error for health monitoring"""
|
|
|
|
health = self.model_health[provider]
|
|
health["error_count"] += 1
|
|
health["last_error"] = error
|
|
health["last_check"] = time.time()
|
|
health["success_rate"] = max(0.0, health["success_rate"] - 0.05)
|
|
|
|
# Mark as unhealthy if too many errors
|
|
if health["error_count"] > 5 and health["success_rate"] < 0.3:
|
|
health["status"] = "unhealthy"
|
|
|
|
async def get_model_comparison(self, request: SummaryRequest) -> Dict[str, Any]:
|
|
"""Get comparison of all available models for the request"""
|
|
|
|
content_analysis = await self._analyze_content_for_model_selection(request)
|
|
|
|
comparisons = {}
|
|
for provider, specs in self.model_specs.items():
|
|
if provider not in self.models:
|
|
continue
|
|
|
|
comparisons[provider.value] = {
|
|
"model_name": specs.model_name,
|
|
"estimated_cost": self._estimate_cost(request, specs),
|
|
"quality_score": specs.quality_score,
|
|
"speed_score": specs.speed_score,
|
|
"capabilities": [cap.value for cap in specs.capabilities],
|
|
"health_status": self.model_health[provider]["status"],
|
|
"suitability_scores": {
|
|
"cost_optimized": self._calculate_model_score(specs, content_analysis, "cost"),
|
|
"quality_focused": self._calculate_model_score(specs, content_analysis, "quality"),
|
|
"speed_focused": self._calculate_model_score(specs, content_analysis, "speed"),
|
|
"balanced": self._calculate_model_score(specs, content_analysis, "balanced")
|
|
}
|
|
}
|
|
|
|
return {
|
|
"content_analysis": content_analysis,
|
|
"model_comparisons": comparisons,
|
|
"recommendation": await self.select_optimal_model(request)
|
|
}
|
|
|
|
def get_health_status(self) -> Dict[str, Any]:
|
|
"""Get health status of all registered models"""
|
|
|
|
return {
|
|
"models": {
|
|
provider.value: {
|
|
"status": health["status"],
|
|
"success_rate": health["success_rate"],
|
|
"error_count": health["error_count"],
|
|
"last_check": health["last_check"],
|
|
"model_name": self.model_specs[provider].model_name
|
|
}
|
|
for provider, health in self.model_health.items()
|
|
},
|
|
"total_healthy": sum(1 for h in self.model_health.values() if h["status"] == "healthy"),
|
|
"total_models": len(self.model_health)
|
|
}
|
|
```
|
|
|
|
### Model-Specific Implementations
|
|
[Source: docs/architecture.md#model-adapters]
|
|
|
|
```python
|
|
# backend/services/anthropic_summarizer.py
|
|
import anthropic
|
|
from .ai_service import AIService, SummaryRequest, SummaryResult, SummaryLength
|
|
|
|
class AnthropicSummarizer(AIService):
|
|
def __init__(self, api_key: str, model: str = "claude-3-5-haiku-20241022"):
|
|
self.client = anthropic.AsyncAnthropic(api_key=api_key)
|
|
self.model = model
|
|
|
|
# Cost per 1K tokens (as of 2025)
|
|
self.input_cost_per_1k = 0.001 # $1.00 per 1M input tokens
|
|
self.output_cost_per_1k = 0.005 # $5.00 per 1M output tokens
|
|
|
|
async def generate_summary(self, request: SummaryRequest) -> SummaryResult:
|
|
"""Generate summary using Anthropic Claude"""
|
|
|
|
prompt = self._build_anthropic_prompt(request)
|
|
|
|
try:
|
|
start_time = time.time()
|
|
|
|
message = await self.client.messages.create(
|
|
model=self.model,
|
|
max_tokens=self._get_max_tokens(request.length),
|
|
temperature=0.3,
|
|
messages=[
|
|
{"role": "user", "content": prompt}
|
|
]
|
|
)
|
|
|
|
processing_time = time.time() - start_time
|
|
|
|
# Parse response (Anthropic returns structured text)
|
|
result_data = self._parse_anthropic_response(message.content[0].text)
|
|
|
|
# Calculate costs
|
|
input_tokens = message.usage.input_tokens
|
|
output_tokens = message.usage.output_tokens
|
|
input_cost = (input_tokens / 1000) * self.input_cost_per_1k
|
|
output_cost = (output_tokens / 1000) * self.output_cost_per_1k
|
|
|
|
return SummaryResult(
|
|
summary=result_data["summary"],
|
|
key_points=result_data["key_points"],
|
|
main_themes=result_data["main_themes"],
|
|
actionable_insights=result_data["actionable_insights"],
|
|
confidence_score=result_data["confidence_score"],
|
|
processing_metadata={
|
|
"model": self.model,
|
|
"processing_time_seconds": processing_time,
|
|
"input_tokens": input_tokens,
|
|
"output_tokens": output_tokens,
|
|
"provider": "anthropic"
|
|
},
|
|
cost_data={
|
|
"input_cost_usd": input_cost,
|
|
"output_cost_usd": output_cost,
|
|
"total_cost_usd": input_cost + output_cost
|
|
}
|
|
)
|
|
|
|
except Exception as e:
|
|
raise AIServiceError(f"Anthropic summarization failed: {str(e)}")
|
|
|
|
def _build_anthropic_prompt(self, request: SummaryRequest) -> str:
|
|
"""Build prompt optimized for Claude's instruction-following"""
|
|
|
|
length_words = {
|
|
SummaryLength.BRIEF: "100-200 words",
|
|
SummaryLength.STANDARD: "300-500 words",
|
|
SummaryLength.DETAILED: "500-800 words"
|
|
}
|
|
|
|
return f"""Please analyze this YouTube video transcript and provide a comprehensive summary.
|
|
|
|
Summary Requirements:
|
|
- Length: {length_words[request.length]}
|
|
- Focus areas: {', '.join(request.focus_areas) if request.focus_areas else 'general content'}
|
|
- Language: {request.language}
|
|
|
|
Please structure your response as follows:
|
|
|
|
## Summary
|
|
[Main summary text here - {length_words[request.length]}]
|
|
|
|
## Key Points
|
|
- [Point 1]
|
|
- [Point 2]
|
|
- [Point 3-7 as appropriate]
|
|
|
|
## Main Themes
|
|
- [Theme 1]
|
|
- [Theme 2]
|
|
- [Theme 3-4 as appropriate]
|
|
|
|
## Actionable Insights
|
|
- [Insight 1]
|
|
- [Insight 2]
|
|
- [Insight 3-5 as appropriate]
|
|
|
|
## Confidence Score
|
|
[Rate your confidence in this summary from 0.0 to 1.0]
|
|
|
|
Transcript:
|
|
{request.transcript}"""
|
|
|
|
# backend/services/deepseek_summarizer.py
|
|
import httpx
|
|
from .ai_service import AIService, SummaryRequest, SummaryResult, SummaryLength
|
|
|
|
class DeepSeekSummarizer(AIService):
|
|
def __init__(self, api_key: str, model: str = "deepseek-chat"):
|
|
self.api_key = api_key
|
|
self.model = model
|
|
self.base_url = "https://api.deepseek.com/v1"
|
|
|
|
# Cost per 1K tokens (DeepSeek pricing)
|
|
self.input_cost_per_1k = 0.00014 # $0.14 per 1M input tokens
|
|
self.output_cost_per_1k = 0.00028 # $0.28 per 1M output tokens
|
|
|
|
async def generate_summary(self, request: SummaryRequest) -> SummaryResult:
|
|
"""Generate summary using DeepSeek API"""
|
|
|
|
prompt = self._build_deepseek_prompt(request)
|
|
|
|
async with httpx.AsyncClient() as client:
|
|
try:
|
|
start_time = time.time()
|
|
|
|
response = await client.post(
|
|
f"{self.base_url}/chat/completions",
|
|
headers={
|
|
"Authorization": f"Bearer {self.api_key}",
|
|
"Content-Type": "application/json"
|
|
},
|
|
json={
|
|
"model": self.model,
|
|
"messages": [
|
|
{"role": "system", "content": "You are an expert content summarizer."},
|
|
{"role": "user", "content": prompt}
|
|
],
|
|
"temperature": 0.3,
|
|
"max_tokens": self._get_max_tokens(request.length),
|
|
"response_format": {"type": "json_object"}
|
|
},
|
|
timeout=60.0
|
|
)
|
|
|
|
response.raise_for_status()
|
|
data = response.json()
|
|
|
|
processing_time = time.time() - start_time
|
|
usage = data["usage"]
|
|
|
|
# Parse JSON response
|
|
result_data = json.loads(data["choices"][0]["message"]["content"])
|
|
|
|
# Calculate costs
|
|
input_cost = (usage["prompt_tokens"] / 1000) * self.input_cost_per_1k
|
|
output_cost = (usage["completion_tokens"] / 1000) * self.output_cost_per_1k
|
|
|
|
return SummaryResult(
|
|
summary=result_data.get("summary", ""),
|
|
key_points=result_data.get("key_points", []),
|
|
main_themes=result_data.get("main_themes", []),
|
|
actionable_insights=result_data.get("actionable_insights", []),
|
|
confidence_score=result_data.get("confidence_score", 0.8),
|
|
processing_metadata={
|
|
"model": self.model,
|
|
"processing_time_seconds": processing_time,
|
|
"prompt_tokens": usage["prompt_tokens"],
|
|
"completion_tokens": usage["completion_tokens"],
|
|
"provider": "deepseek"
|
|
},
|
|
cost_data={
|
|
"input_cost_usd": input_cost,
|
|
"output_cost_usd": output_cost,
|
|
"total_cost_usd": input_cost + output_cost
|
|
}
|
|
)
|
|
|
|
except Exception as e:
|
|
raise AIServiceError(f"DeepSeek summarization failed: {str(e)}")
|
|
```
|
|
|
|
### Frontend Model Selection Interface
|
|
[Source: docs/architecture.md#frontend-integration]
|
|
|
|
```typescript
|
|
// frontend/src/components/forms/ModelSelector.tsx
|
|
import { useState } from 'react';
|
|
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card';
|
|
import { Button } from '@/components/ui/button';
|
|
import { Badge } from '@/components/ui/badge';
|
|
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select';
|
|
import { Tabs, TabsContent, TabsList, TabsTrigger } from '@/components/ui/tabs';
|
|
|
|
interface ModelComparison {
|
|
model_name: string;
|
|
estimated_cost: number;
|
|
quality_score: number;
|
|
speed_score: number;
|
|
capabilities: string[];
|
|
health_status: string;
|
|
suitability_scores: {
|
|
cost_optimized: number;
|
|
quality_focused: number;
|
|
speed_focused: number;
|
|
balanced: number;
|
|
};
|
|
}
|
|
|
|
interface ModelSelectorProps {
|
|
comparisons: Record<string, ModelComparison>;
|
|
selectedModel?: string;
|
|
onModelSelect: (model: string, priority: string) => void;
|
|
}
|
|
|
|
export function ModelSelector({ comparisons, selectedModel, onModelSelect }: ModelSelectorProps) {
|
|
const [priority, setPriority] = useState<string>('balanced');
|
|
const [showComparison, setShowComparison] = useState(false);
|
|
|
|
const getBestModelForPriority = (priority: string) => {
|
|
const scores = Object.entries(comparisons).map(([provider, data]) => ({
|
|
provider,
|
|
score: data.suitability_scores[priority as keyof typeof data.suitability_scores]
|
|
}));
|
|
|
|
return scores.sort((a, b) => b.score - a.score)[0]?.provider;
|
|
};
|
|
|
|
const formatCost = (cost: number) => `$${cost.toFixed(4)}`;
|
|
|
|
const getQualityBadgeColor = (score: number) => {
|
|
if (score >= 0.9) return 'bg-green-100 text-green-800';
|
|
if (score >= 0.8) return 'bg-blue-100 text-blue-800';
|
|
return 'bg-yellow-100 text-yellow-800';
|
|
};
|
|
|
|
return (
|
|
<Card className="w-full">
|
|
<CardHeader>
|
|
<CardTitle className="flex items-center justify-between">
|
|
<span>AI Model Selection</span>
|
|
<Button
|
|
variant="outline"
|
|
size="sm"
|
|
onClick={() => setShowComparison(!showComparison)}
|
|
>
|
|
{showComparison ? 'Hide' : 'Show'} Comparison
|
|
</Button>
|
|
</CardTitle>
|
|
</CardHeader>
|
|
<CardContent>
|
|
<div className="space-y-4">
|
|
<div className="flex items-center space-x-4">
|
|
<label className="text-sm font-medium">Priority:</label>
|
|
<Select value={priority} onValueChange={setPriority}>
|
|
<SelectTrigger className="w-40">
|
|
<SelectValue />
|
|
</SelectTrigger>
|
|
<SelectContent>
|
|
<SelectItem value="cost">Cost Optimized</SelectItem>
|
|
<SelectItem value="quality">High Quality</SelectItem>
|
|
<SelectItem value="speed">Fast Processing</SelectItem>
|
|
<SelectItem value="balanced">Balanced</SelectItem>
|
|
</SelectContent>
|
|
</Select>
|
|
|
|
<Button
|
|
onClick={() => onModelSelect(getBestModelForPriority(priority), priority)}
|
|
variant="default"
|
|
>
|
|
Use Recommended ({getBestModelForPriority(priority)})
|
|
</Button>
|
|
</div>
|
|
|
|
{showComparison && (
|
|
<Tabs defaultValue="overview" className="w-full">
|
|
<TabsList className="grid w-full grid-cols-2">
|
|
<TabsTrigger value="overview">Overview</TabsTrigger>
|
|
<TabsTrigger value="detailed">Detailed Comparison</TabsTrigger>
|
|
</TabsList>
|
|
|
|
<TabsContent value="overview" className="space-y-4">
|
|
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
|
|
{Object.entries(comparisons).map(([provider, data]) => (
|
|
<Card
|
|
key={provider}
|
|
className={`cursor-pointer transition-colors ${
|
|
selectedModel === provider ? 'ring-2 ring-blue-500' : 'hover:bg-gray-50'
|
|
}`}
|
|
onClick={() => onModelSelect(provider, priority)}
|
|
>
|
|
<CardHeader className="pb-2">
|
|
<CardTitle className="text-sm flex items-center justify-between">
|
|
<span>{data.model_name}</span>
|
|
<Badge
|
|
className={
|
|
data.health_status === 'healthy'
|
|
? 'bg-green-100 text-green-800'
|
|
: 'bg-red-100 text-red-800'
|
|
}
|
|
>
|
|
{data.health_status}
|
|
</Badge>
|
|
</CardTitle>
|
|
</CardHeader>
|
|
<CardContent className="pt-0 space-y-2">
|
|
<div className="flex justify-between text-sm">
|
|
<span>Cost:</span>
|
|
<span className="font-mono">{formatCost(data.estimated_cost)}</span>
|
|
</div>
|
|
<div className="flex justify-between text-sm">
|
|
<span>Quality:</span>
|
|
<Badge className={getQualityBadgeColor(data.quality_score)}>
|
|
{(data.quality_score * 100).toFixed(0)}%
|
|
</Badge>
|
|
</div>
|
|
<div className="flex justify-between text-sm">
|
|
<span>Speed:</span>
|
|
<Badge className={getQualityBadgeColor(data.speed_score)}>
|
|
{(data.speed_score * 100).toFixed(0)}%
|
|
</Badge>
|
|
</div>
|
|
<div className="text-xs text-gray-600">
|
|
Suitability: {(data.suitability_scores[priority as keyof typeof data.suitability_scores] * 100).toFixed(0)}%
|
|
</div>
|
|
</CardContent>
|
|
</Card>
|
|
))}
|
|
</div>
|
|
</TabsContent>
|
|
|
|
<TabsContent value="detailed" className="space-y-4">
|
|
<div className="overflow-x-auto">
|
|
<table className="min-w-full divide-y divide-gray-200">
|
|
<thead className="bg-gray-50">
|
|
<tr>
|
|
<th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase">Model</th>
|
|
<th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase">Cost</th>
|
|
<th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase">Quality</th>
|
|
<th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase">Speed</th>
|
|
<th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase">Capabilities</th>
|
|
<th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase">Status</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody className="bg-white divide-y divide-gray-200">
|
|
{Object.entries(comparisons).map(([provider, data]) => (
|
|
<tr key={provider} className="hover:bg-gray-50">
|
|
<td className="px-6 py-4 whitespace-nowrap text-sm font-medium text-gray-900">
|
|
{data.model_name}
|
|
</td>
|
|
<td className="px-6 py-4 whitespace-nowrap text-sm text-gray-500 font-mono">
|
|
{formatCost(data.estimated_cost)}
|
|
</td>
|
|
<td className="px-6 py-4 whitespace-nowrap text-sm text-gray-500">
|
|
{(data.quality_score * 100).toFixed(0)}%
|
|
</td>
|
|
<td className="px-6 py-4 whitespace-nowrap text-sm text-gray-500">
|
|
{(data.speed_score * 100).toFixed(0)}%
|
|
</td>
|
|
<td className="px-6 py-4 text-sm text-gray-500">
|
|
<div className="flex flex-wrap gap-1">
|
|
{data.capabilities.slice(0, 3).map(cap => (
|
|
<Badge key={cap} variant="secondary" className="text-xs">
|
|
{cap.replace('_', ' ')}
|
|
</Badge>
|
|
))}
|
|
{data.capabilities.length > 3 && (
|
|
<Badge variant="secondary" className="text-xs">
|
|
+{data.capabilities.length - 3} more
|
|
</Badge>
|
|
)}
|
|
</div>
|
|
</td>
|
|
<td className="px-6 py-4 whitespace-nowrap">
|
|
<Badge
|
|
className={
|
|
data.health_status === 'healthy'
|
|
? 'bg-green-100 text-green-800'
|
|
: 'bg-red-100 text-red-800'
|
|
}
|
|
>
|
|
{data.health_status}
|
|
</Badge>
|
|
</td>
|
|
</tr>
|
|
))}
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</TabsContent>
|
|
</Tabs>
|
|
)}
|
|
</div>
|
|
</CardContent>
|
|
</Card>
|
|
);
|
|
}
|
|
```
|
|
|
|
### Performance Benefits
|
|
- **Intelligent Model Selection**: Automatically chooses optimal model based on content and preferences
|
|
- **Cost Optimization**: Up to 50% cost savings by selecting appropriate model for content type
|
|
- **Quality Assurance**: Fallback mechanisms ensure consistent quality even during model outages
|
|
- **Flexibility**: Users can prioritize cost, quality, or speed based on their needs
|
|
- **Reliability**: Multi-model redundancy provides 99.9% uptime for summarization service
|
|
|
|
## Change Log
|
|
|
|
| Date | Version | Description | Author |
|
|
|------|---------|-------------|--------|
|
|
| 2025-01-25 | 1.0 | Initial story creation | Bob (Scrum Master) |
|
|
|
|
## Dev Agent Record
|
|
|
|
*This section will be populated by the development agent during implementation*
|
|
|
|
## QA Results
|
|
|
|
*Results from QA Agent review of the completed story implementation will be added here* |