11 KiB
Epic 2: AI Summarization Engine
Epic Overview
Goal: Implement the core AI-powered summarization functionality that transforms transcripts into valuable, concise summaries. This epic establishes the intelligence layer of the application with support for multiple AI providers and intelligent caching.
Priority: High - Core product functionality
Epic Dependencies: Epic 1 (Foundation & Core YouTube Integration)
Estimated Complexity: High (AI integration and optimization)
Epic Success Criteria
Upon completion of this epic, the YouTube Summarizer will:
-
Intelligent Summary Generation
- High-quality AI-generated summaries using OpenAI GPT-4o-mini
- Structured output with overview, key points, and chapters
- Cost-optimized processing (~$0.001-0.005 per summary)
-
Multi-Model AI Support
- Support for OpenAI, Anthropic, and DeepSeek models
- Automatic failover between models
- User model selection with cost transparency
-
Performance Optimization
- Intelligent caching system (24-hour TTL)
- Background processing for long videos
- Cost tracking and optimization
-
Export Capabilities
- Multiple export formats (Markdown, PDF, plain text)
- Copy-to-clipboard functionality
- Batch export support
Stories in Epic 2
Story 2.1: Single AI Model Integration
As a user
I want AI-generated summaries of video transcripts
So that I can quickly understand video content without watching
Acceptance Criteria
- Successfully integrates with OpenAI GPT-4o-mini API for summary generation
- Implements proper prompt engineering for consistent summary quality
- Handles token limits by chunking long transcripts intelligently at sentence boundaries
- Returns structured summary with overview, key points, and conclusion sections
- Includes error handling for API failures with user-friendly messages
- Tracks token usage and estimated cost per summary for monitoring
Status: Ready for story creation
Dependencies: Story 1.4 (Basic Web Interface)
Story 2.2: Summary Generation Pipeline
As a user
I want high-quality summaries that capture the essence of videos
So that I can trust the summaries for decision-making
Acceptance Criteria
- Pipeline processes transcript through cleaning and preprocessing steps
- Removes filler words, repeated phrases, and transcript artifacts
- Identifies and preserves important quotes and specific claims
- Generates hierarchical summary with main points and supporting details
- Summary length is proportional to video length (approximately 10% of transcript)
- Processing completes within 30 seconds for videos under 30 minutes
Status: Ready for story creation
Dependencies: Story 2.1 (Single AI Model Integration)
Story 2.3: Caching System Implementation
As a system operator
I want summaries cached to reduce costs and improve performance
So that the system remains economically viable
Acceptance Criteria
- Redis cache stores summaries with composite key (video_id + model + params)
- Cache TTL set to 24 hours with option to configure
- Cache hit returns summary in under 200ms
- Cache invalidation API endpoint for administrative use
- Implements cache warming for popular videos during low-traffic periods
- Dashboard displays cache hit rate and cost savings metrics
Status: Ready for story creation
Dependencies: Story 2.2 (Summary Generation Pipeline)
Story 2.4: Multi-Model Support
As a user
I want to choose between different AI models
So that I can balance cost, speed, and quality based on my needs
Acceptance Criteria
- Supports OpenAI, Anthropic Claude, and DeepSeek models
- Model selection dropdown appears when multiple models are configured
- Each model has optimized prompts for best performance
- Fallback chain activates when primary model fails
- Model performance metrics tracked for comparison
- Cost per summary displayed before generation
Status: Ready for story creation
Dependencies: Story 2.3 (Caching System Implementation)
Story 2.5: Export Functionality
As a user
I want to export summaries in various formats
So that I can integrate them into my workflow
Acceptance Criteria
- Export available in Markdown, PDF, and plain text formats
- Exported files include metadata (video title, URL, date, model used)
- Markdown export preserves formatting and structure
- PDF export is properly formatted with headers and sections
- Copy-to-clipboard works for entire summary or individual sections
- Batch export available for multiple summaries from history
Status: Ready for story creation
Dependencies: Story 2.4 (Multi-Model Support)
Technical Architecture Context
AI Integration Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ AI Services │
│ │ │ │ │ │
│ • Model Select │◄──►│ • AI Service │◄──►│ • OpenAI API │
│ • Progress UI │ │ • Prompt Mgmt │ │ • Anthropic API │
│ • Export UI │ │ • Token Tracking│ │ • DeepSeek API │
│ │ │ • Cost Monitor │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Cache Layer │
│ │
│ • Memory Cache │
│ • DB Cache │
│ • Smart Keys │
└─────────────────┘
Key Services for Epic 2
AI Service Architecture
class AIService:
def __init__(self, provider: str, api_key: str):
self.provider = provider
self.client = self._get_client(provider, api_key)
async def generate_summary(
self,
transcript: str,
video_metadata: Dict[str, Any],
options: Dict[str, Any] = None
) -> Dict[str, Any]:
"""Generate structured summary with cost tracking"""
Caching Strategy
def get_cache_key(video_id: str, model: str, options: dict) -> str:
"""Generate cache key: hash(video_id + model + options)"""
key_data = f"{video_id}:{model}:{json.dumps(options, sort_keys=True)}"
return hashlib.sha256(key_data.encode()).hexdigest()
Cost Optimization Strategy
Target Cost Structure
- Primary Model: OpenAI GPT-4o-mini (~$0.001/1K tokens)
- Typical Video Cost: $0.001-0.005 per 30-minute video
- Caching Benefit: ~80% reduction for repeat requests
- Monthly Budget: ~$0.10/month for hobby usage
Token Optimization Techniques
- Intelligent Chunking: Split long transcripts at sentence boundaries
- Prompt Optimization: Efficient prompts for consistent output
- Preprocessing: Remove transcript artifacts and filler words
- Fallback Strategy: Use cheaper models when primary fails
Non-Functional Requirements for Epic 2
Performance
- Summary generation within 30 seconds for videos under 30 minutes
- Cache hits return results in under 200ms
- Background processing for videos over 1 hour
Cost Management
- Token usage tracking with alerts
- Cost estimation before processing
- Monthly budget monitoring and warnings
Quality Assurance
- Consistent summary structure across all models
- Quality metrics tracking (summary length, key points extraction)
- A/B testing capability for prompt optimization
Reliability
- Multi-model fallback chain
- Retry logic with exponential backoff
- Graceful degradation when AI services unavailable
Definition of Done for Epic 2
- All 5 stories completed and validated
- User can generate AI summaries from video transcripts
- Multiple AI models supported with fallback
- Caching system operational with cost savings visible
- Export functionality working for all formats
- Cost tracking under $0.10/month target for typical usage
- Performance targets met (30s generation, 200ms cache)
- Error handling graceful for all AI service failures
API Endpoints Introduced in Epic 2
POST /api/summarize
interface SummarizeRequest {
url: string;
model?: "openai" | "anthropic" | "deepseek";
options?: {
length?: "brief" | "standard" | "detailed";
focus?: string;
};
}
GET /api/summary/{id}
interface SummaryResponse {
id: string;
video: VideoMetadata;
summary: {
text: string;
key_points: string[];
chapters: Chapter[];
model_used: string;
};
metadata: {
processing_time: number;
token_count: number;
cost_estimate: number;
};
}
POST /api/export/{id}
interface ExportRequest {
format: "markdown" | "pdf" | "txt";
options?: ExportOptions;
}
Risks and Mitigation
AI Service Risks
- API Rate Limits: Multi-model fallback and intelligent queuing
- Cost Overruns: Usage monitoring and budget alerts
- Quality Degradation: A/B testing and quality metrics
Technical Risks
- Token Limit Exceeded: Intelligent chunking and preprocessing
- Cache Invalidation: Smart cache key generation and TTL management
- Export Failures: Robust file generation with error recovery
Business Risks
- User Experience: Background processing and progress indicators
- Cost Scaling: Caching strategy and cost optimization
- Model Availability: Multi-provider architecture
Success Metrics
Quality Metrics
- Summary Accuracy: User satisfaction feedback
- Consistency: Structured output compliance across models
- Coverage: Key points extraction rate
Performance Metrics
- Generation Time: < 30 seconds for 30-minute videos
- Cache Hit Rate: > 70% for popular content
- Cost Efficiency: < $0.005 per summary average
Technical Metrics
- API Reliability: > 99% successful requests
- Error Recovery: < 5% failed summaries
- Export Success: > 98% successful exports
Epic Status: Ready for Implementation
Dependencies: Epic 1 must be completed first
Next Action: Create Story 2.1 (Single AI Model Integration)
Epic Owner: Bob (Scrum Master)
Last Updated: 2025-01-25