youtube-summarizer/docs/prd/epic-2-ai-summarization-eng...

299 lines
11 KiB
Markdown

# Epic 2: AI Summarization Engine
## Epic Overview
**Goal**: Implement the core AI-powered summarization functionality that transforms transcripts into valuable, concise summaries. This epic establishes the intelligence layer of the application with support for multiple AI providers and intelligent caching.
**Priority**: High - Core product functionality
**Epic Dependencies**: Epic 1 (Foundation & Core YouTube Integration)
**Estimated Complexity**: High (AI integration and optimization)
## Epic Success Criteria
Upon completion of this epic, the YouTube Summarizer will:
1. **Intelligent Summary Generation**
- High-quality AI-generated summaries using OpenAI GPT-4o-mini
- Structured output with overview, key points, and chapters
- Cost-optimized processing (~$0.001-0.005 per summary)
2. **Multi-Model AI Support**
- Support for OpenAI, Anthropic, and DeepSeek models
- Automatic failover between models
- User model selection with cost transparency
3. **Performance Optimization**
- Intelligent caching system (24-hour TTL)
- Background processing for long videos
- Cost tracking and optimization
4. **Export Capabilities**
- Multiple export formats (Markdown, PDF, plain text)
- Copy-to-clipboard functionality
- Batch export support
## Stories in Epic 2
### Story 2.1: Single AI Model Integration
**As a** user
**I want** AI-generated summaries of video transcripts
**So that** I can quickly understand video content without watching
#### Acceptance Criteria
1. Successfully integrates with OpenAI GPT-4o-mini API for summary generation
2. Implements proper prompt engineering for consistent summary quality
3. Handles token limits by chunking long transcripts intelligently at sentence boundaries
4. Returns structured summary with overview, key points, and conclusion sections
5. Includes error handling for API failures with user-friendly messages
6. Tracks token usage and estimated cost per summary for monitoring
**Status**: Ready for story creation
**Dependencies**: Story 1.4 (Basic Web Interface)
### Story 2.2: Summary Generation Pipeline
**As a** user
**I want** high-quality summaries that capture the essence of videos
**So that** I can trust the summaries for decision-making
#### Acceptance Criteria
1. Pipeline processes transcript through cleaning and preprocessing steps
2. Removes filler words, repeated phrases, and transcript artifacts
3. Identifies and preserves important quotes and specific claims
4. Generates hierarchical summary with main points and supporting details
5. Summary length is proportional to video length (approximately 10% of transcript)
6. Processing completes within 30 seconds for videos under 30 minutes
**Status**: Ready for story creation
**Dependencies**: Story 2.1 (Single AI Model Integration)
### Story 2.3: Caching System Implementation
**As a** system operator
**I want** summaries cached to reduce costs and improve performance
**So that** the system remains economically viable
#### Acceptance Criteria
1. Redis cache stores summaries with composite key (video_id + model + params)
2. Cache TTL set to 24 hours with option to configure
3. Cache hit returns summary in under 200ms
4. Cache invalidation API endpoint for administrative use
5. Implements cache warming for popular videos during low-traffic periods
6. Dashboard displays cache hit rate and cost savings metrics
**Status**: Ready for story creation
**Dependencies**: Story 2.2 (Summary Generation Pipeline)
### Story 2.4: Multi-Model Support
**As a** user
**I want** to choose between different AI models
**So that** I can balance cost, speed, and quality based on my needs
#### Acceptance Criteria
1. Supports OpenAI, Anthropic Claude, and DeepSeek models
2. Model selection dropdown appears when multiple models are configured
3. Each model has optimized prompts for best performance
4. Fallback chain activates when primary model fails
5. Model performance metrics tracked for comparison
6. Cost per summary displayed before generation
**Status**: Ready for story creation
**Dependencies**: Story 2.3 (Caching System Implementation)
### Story 2.5: Export Functionality
**As a** user
**I want** to export summaries in various formats
**So that** I can integrate them into my workflow
#### Acceptance Criteria
1. Export available in Markdown, PDF, and plain text formats
2. Exported files include metadata (video title, URL, date, model used)
3. Markdown export preserves formatting and structure
4. PDF export is properly formatted with headers and sections
5. Copy-to-clipboard works for entire summary or individual sections
6. Batch export available for multiple summaries from history
**Status**: Ready for story creation
**Dependencies**: Story 2.4 (Multi-Model Support)
## Technical Architecture Context
### AI Integration Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ AI Services │
│ │ │ │ │ │
│ • Model Select │◄──►│ • AI Service │◄──►│ • OpenAI API │
│ • Progress UI │ │ • Prompt Mgmt │ │ • Anthropic API │
│ • Export UI │ │ • Token Tracking│ │ • DeepSeek API │
│ │ │ • Cost Monitor │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────┐
│ Cache Layer │
│ │
│ • Memory Cache │
│ • DB Cache │
│ • Smart Keys │
└─────────────────┘
```
### Key Services for Epic 2
#### AI Service Architecture
```python
class AIService:
def __init__(self, provider: str, api_key: str):
self.provider = provider
self.client = self._get_client(provider, api_key)
async def generate_summary(
self,
transcript: str,
video_metadata: Dict[str, Any],
options: Dict[str, Any] = None
) -> Dict[str, Any]:
"""Generate structured summary with cost tracking"""
```
#### Caching Strategy
```python
def get_cache_key(video_id: str, model: str, options: dict) -> str:
"""Generate cache key: hash(video_id + model + options)"""
key_data = f"{video_id}:{model}:{json.dumps(options, sort_keys=True)}"
return hashlib.sha256(key_data.encode()).hexdigest()
```
### Cost Optimization Strategy
#### Target Cost Structure
- **Primary Model**: OpenAI GPT-4o-mini (~$0.001/1K tokens)
- **Typical Video Cost**: $0.001-0.005 per 30-minute video
- **Caching Benefit**: ~80% reduction for repeat requests
- **Monthly Budget**: ~$0.10/month for hobby usage
#### Token Optimization Techniques
1. **Intelligent Chunking**: Split long transcripts at sentence boundaries
2. **Prompt Optimization**: Efficient prompts for consistent output
3. **Preprocessing**: Remove transcript artifacts and filler words
4. **Fallback Strategy**: Use cheaper models when primary fails
## Non-Functional Requirements for Epic 2
### Performance
- Summary generation within 30 seconds for videos under 30 minutes
- Cache hits return results in under 200ms
- Background processing for videos over 1 hour
### Cost Management
- Token usage tracking with alerts
- Cost estimation before processing
- Monthly budget monitoring and warnings
### Quality Assurance
- Consistent summary structure across all models
- Quality metrics tracking (summary length, key points extraction)
- A/B testing capability for prompt optimization
### Reliability
- Multi-model fallback chain
- Retry logic with exponential backoff
- Graceful degradation when AI services unavailable
## Definition of Done for Epic 2
- [ ] All 5 stories completed and validated
- [ ] User can generate AI summaries from video transcripts
- [ ] Multiple AI models supported with fallback
- [ ] Caching system operational with cost savings visible
- [ ] Export functionality working for all formats
- [ ] Cost tracking under $0.10/month target for typical usage
- [ ] Performance targets met (30s generation, 200ms cache)
- [ ] Error handling graceful for all AI service failures
## API Endpoints Introduced in Epic 2
### POST /api/summarize
```typescript
interface SummarizeRequest {
url: string;
model?: "openai" | "anthropic" | "deepseek";
options?: {
length?: "brief" | "standard" | "detailed";
focus?: string;
};
}
```
### GET /api/summary/{id}
```typescript
interface SummaryResponse {
id: string;
video: VideoMetadata;
summary: {
text: string;
key_points: string[];
chapters: Chapter[];
model_used: string;
};
metadata: {
processing_time: number;
token_count: number;
cost_estimate: number;
};
}
```
### POST /api/export/{id}
```typescript
interface ExportRequest {
format: "markdown" | "pdf" | "txt";
options?: ExportOptions;
}
```
## Risks and Mitigation
### AI Service Risks
1. **API Rate Limits**: Multi-model fallback and intelligent queuing
2. **Cost Overruns**: Usage monitoring and budget alerts
3. **Quality Degradation**: A/B testing and quality metrics
### Technical Risks
1. **Token Limit Exceeded**: Intelligent chunking and preprocessing
2. **Cache Invalidation**: Smart cache key generation and TTL management
3. **Export Failures**: Robust file generation with error recovery
### Business Risks
1. **User Experience**: Background processing and progress indicators
2. **Cost Scaling**: Caching strategy and cost optimization
3. **Model Availability**: Multi-provider architecture
## Success Metrics
### Quality Metrics
- **Summary Accuracy**: User satisfaction feedback
- **Consistency**: Structured output compliance across models
- **Coverage**: Key points extraction rate
### Performance Metrics
- **Generation Time**: < 30 seconds for 30-minute videos
- **Cache Hit Rate**: > 70% for popular content
- **Cost Efficiency**: < $0.005 per summary average
### Technical Metrics
- **API Reliability**: > 99% successful requests
- **Error Recovery**: < 5% failed summaries
- **Export Success**: > 98% successful exports
---
**Epic Status**: Ready for Implementation
**Dependencies**: Epic 1 must be completed first
**Next Action**: Create Story 2.1 (Single AI Model Integration)
**Epic Owner**: Bob (Scrum Master)
**Last Updated**: 2025-01-25