299 lines
11 KiB
Markdown
299 lines
11 KiB
Markdown
# Epic 2: AI Summarization Engine
|
|
|
|
## Epic Overview
|
|
|
|
**Goal**: Implement the core AI-powered summarization functionality that transforms transcripts into valuable, concise summaries. This epic establishes the intelligence layer of the application with support for multiple AI providers and intelligent caching.
|
|
|
|
**Priority**: High - Core product functionality
|
|
**Epic Dependencies**: Epic 1 (Foundation & Core YouTube Integration)
|
|
**Estimated Complexity**: High (AI integration and optimization)
|
|
|
|
## Epic Success Criteria
|
|
|
|
Upon completion of this epic, the YouTube Summarizer will:
|
|
|
|
1. **Intelligent Summary Generation**
|
|
- High-quality AI-generated summaries using OpenAI GPT-4o-mini
|
|
- Structured output with overview, key points, and chapters
|
|
- Cost-optimized processing (~$0.001-0.005 per summary)
|
|
|
|
2. **Multi-Model AI Support**
|
|
- Support for OpenAI, Anthropic, and DeepSeek models
|
|
- Automatic failover between models
|
|
- User model selection with cost transparency
|
|
|
|
3. **Performance Optimization**
|
|
- Intelligent caching system (24-hour TTL)
|
|
- Background processing for long videos
|
|
- Cost tracking and optimization
|
|
|
|
4. **Export Capabilities**
|
|
- Multiple export formats (Markdown, PDF, plain text)
|
|
- Copy-to-clipboard functionality
|
|
- Batch export support
|
|
|
|
## Stories in Epic 2
|
|
|
|
### Story 2.1: Single AI Model Integration
|
|
|
|
**As a** user
|
|
**I want** AI-generated summaries of video transcripts
|
|
**So that** I can quickly understand video content without watching
|
|
|
|
#### Acceptance Criteria
|
|
1. Successfully integrates with OpenAI GPT-4o-mini API for summary generation
|
|
2. Implements proper prompt engineering for consistent summary quality
|
|
3. Handles token limits by chunking long transcripts intelligently at sentence boundaries
|
|
4. Returns structured summary with overview, key points, and conclusion sections
|
|
5. Includes error handling for API failures with user-friendly messages
|
|
6. Tracks token usage and estimated cost per summary for monitoring
|
|
|
|
**Status**: Ready for story creation
|
|
**Dependencies**: Story 1.4 (Basic Web Interface)
|
|
|
|
### Story 2.2: Summary Generation Pipeline
|
|
|
|
**As a** user
|
|
**I want** high-quality summaries that capture the essence of videos
|
|
**So that** I can trust the summaries for decision-making
|
|
|
|
#### Acceptance Criteria
|
|
1. Pipeline processes transcript through cleaning and preprocessing steps
|
|
2. Removes filler words, repeated phrases, and transcript artifacts
|
|
3. Identifies and preserves important quotes and specific claims
|
|
4. Generates hierarchical summary with main points and supporting details
|
|
5. Summary length is proportional to video length (approximately 10% of transcript)
|
|
6. Processing completes within 30 seconds for videos under 30 minutes
|
|
|
|
**Status**: Ready for story creation
|
|
**Dependencies**: Story 2.1 (Single AI Model Integration)
|
|
|
|
### Story 2.3: Caching System Implementation
|
|
|
|
**As a** system operator
|
|
**I want** summaries cached to reduce costs and improve performance
|
|
**So that** the system remains economically viable
|
|
|
|
#### Acceptance Criteria
|
|
1. Redis cache stores summaries with composite key (video_id + model + params)
|
|
2. Cache TTL set to 24 hours with option to configure
|
|
3. Cache hit returns summary in under 200ms
|
|
4. Cache invalidation API endpoint for administrative use
|
|
5. Implements cache warming for popular videos during low-traffic periods
|
|
6. Dashboard displays cache hit rate and cost savings metrics
|
|
|
|
**Status**: Ready for story creation
|
|
**Dependencies**: Story 2.2 (Summary Generation Pipeline)
|
|
|
|
### Story 2.4: Multi-Model Support
|
|
|
|
**As a** user
|
|
**I want** to choose between different AI models
|
|
**So that** I can balance cost, speed, and quality based on my needs
|
|
|
|
#### Acceptance Criteria
|
|
1. Supports OpenAI, Anthropic Claude, and DeepSeek models
|
|
2. Model selection dropdown appears when multiple models are configured
|
|
3. Each model has optimized prompts for best performance
|
|
4. Fallback chain activates when primary model fails
|
|
5. Model performance metrics tracked for comparison
|
|
6. Cost per summary displayed before generation
|
|
|
|
**Status**: Ready for story creation
|
|
**Dependencies**: Story 2.3 (Caching System Implementation)
|
|
|
|
### Story 2.5: Export Functionality
|
|
|
|
**As a** user
|
|
**I want** to export summaries in various formats
|
|
**So that** I can integrate them into my workflow
|
|
|
|
#### Acceptance Criteria
|
|
1. Export available in Markdown, PDF, and plain text formats
|
|
2. Exported files include metadata (video title, URL, date, model used)
|
|
3. Markdown export preserves formatting and structure
|
|
4. PDF export is properly formatted with headers and sections
|
|
5. Copy-to-clipboard works for entire summary or individual sections
|
|
6. Batch export available for multiple summaries from history
|
|
|
|
**Status**: Ready for story creation
|
|
**Dependencies**: Story 2.4 (Multi-Model Support)
|
|
|
|
## Technical Architecture Context
|
|
|
|
### AI Integration Architecture
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ Frontend │ │ Backend │ │ AI Services │
|
|
│ │ │ │ │ │
|
|
│ • Model Select │◄──►│ • AI Service │◄──►│ • OpenAI API │
|
|
│ • Progress UI │ │ • Prompt Mgmt │ │ • Anthropic API │
|
|
│ • Export UI │ │ • Token Tracking│ │ • DeepSeek API │
|
|
│ │ │ • Cost Monitor │ │ │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Cache Layer │
|
|
│ │
|
|
│ • Memory Cache │
|
|
│ • DB Cache │
|
|
│ • Smart Keys │
|
|
└─────────────────┘
|
|
```
|
|
|
|
### Key Services for Epic 2
|
|
|
|
#### AI Service Architecture
|
|
```python
|
|
class AIService:
|
|
def __init__(self, provider: str, api_key: str):
|
|
self.provider = provider
|
|
self.client = self._get_client(provider, api_key)
|
|
|
|
async def generate_summary(
|
|
self,
|
|
transcript: str,
|
|
video_metadata: Dict[str, Any],
|
|
options: Dict[str, Any] = None
|
|
) -> Dict[str, Any]:
|
|
"""Generate structured summary with cost tracking"""
|
|
```
|
|
|
|
#### Caching Strategy
|
|
```python
|
|
def get_cache_key(video_id: str, model: str, options: dict) -> str:
|
|
"""Generate cache key: hash(video_id + model + options)"""
|
|
key_data = f"{video_id}:{model}:{json.dumps(options, sort_keys=True)}"
|
|
return hashlib.sha256(key_data.encode()).hexdigest()
|
|
```
|
|
|
|
### Cost Optimization Strategy
|
|
|
|
#### Target Cost Structure
|
|
- **Primary Model**: OpenAI GPT-4o-mini (~$0.001/1K tokens)
|
|
- **Typical Video Cost**: $0.001-0.005 per 30-minute video
|
|
- **Caching Benefit**: ~80% reduction for repeat requests
|
|
- **Monthly Budget**: ~$0.10/month for hobby usage
|
|
|
|
#### Token Optimization Techniques
|
|
1. **Intelligent Chunking**: Split long transcripts at sentence boundaries
|
|
2. **Prompt Optimization**: Efficient prompts for consistent output
|
|
3. **Preprocessing**: Remove transcript artifacts and filler words
|
|
4. **Fallback Strategy**: Use cheaper models when primary fails
|
|
|
|
## Non-Functional Requirements for Epic 2
|
|
|
|
### Performance
|
|
- Summary generation within 30 seconds for videos under 30 minutes
|
|
- Cache hits return results in under 200ms
|
|
- Background processing for videos over 1 hour
|
|
|
|
### Cost Management
|
|
- Token usage tracking with alerts
|
|
- Cost estimation before processing
|
|
- Monthly budget monitoring and warnings
|
|
|
|
### Quality Assurance
|
|
- Consistent summary structure across all models
|
|
- Quality metrics tracking (summary length, key points extraction)
|
|
- A/B testing capability for prompt optimization
|
|
|
|
### Reliability
|
|
- Multi-model fallback chain
|
|
- Retry logic with exponential backoff
|
|
- Graceful degradation when AI services unavailable
|
|
|
|
## Definition of Done for Epic 2
|
|
|
|
- [ ] All 5 stories completed and validated
|
|
- [ ] User can generate AI summaries from video transcripts
|
|
- [ ] Multiple AI models supported with fallback
|
|
- [ ] Caching system operational with cost savings visible
|
|
- [ ] Export functionality working for all formats
|
|
- [ ] Cost tracking under $0.10/month target for typical usage
|
|
- [ ] Performance targets met (30s generation, 200ms cache)
|
|
- [ ] Error handling graceful for all AI service failures
|
|
|
|
## API Endpoints Introduced in Epic 2
|
|
|
|
### POST /api/summarize
|
|
```typescript
|
|
interface SummarizeRequest {
|
|
url: string;
|
|
model?: "openai" | "anthropic" | "deepseek";
|
|
options?: {
|
|
length?: "brief" | "standard" | "detailed";
|
|
focus?: string;
|
|
};
|
|
}
|
|
```
|
|
|
|
### GET /api/summary/{id}
|
|
```typescript
|
|
interface SummaryResponse {
|
|
id: string;
|
|
video: VideoMetadata;
|
|
summary: {
|
|
text: string;
|
|
key_points: string[];
|
|
chapters: Chapter[];
|
|
model_used: string;
|
|
};
|
|
metadata: {
|
|
processing_time: number;
|
|
token_count: number;
|
|
cost_estimate: number;
|
|
};
|
|
}
|
|
```
|
|
|
|
### POST /api/export/{id}
|
|
```typescript
|
|
interface ExportRequest {
|
|
format: "markdown" | "pdf" | "txt";
|
|
options?: ExportOptions;
|
|
}
|
|
```
|
|
|
|
## Risks and Mitigation
|
|
|
|
### AI Service Risks
|
|
1. **API Rate Limits**: Multi-model fallback and intelligent queuing
|
|
2. **Cost Overruns**: Usage monitoring and budget alerts
|
|
3. **Quality Degradation**: A/B testing and quality metrics
|
|
|
|
### Technical Risks
|
|
1. **Token Limit Exceeded**: Intelligent chunking and preprocessing
|
|
2. **Cache Invalidation**: Smart cache key generation and TTL management
|
|
3. **Export Failures**: Robust file generation with error recovery
|
|
|
|
### Business Risks
|
|
1. **User Experience**: Background processing and progress indicators
|
|
2. **Cost Scaling**: Caching strategy and cost optimization
|
|
3. **Model Availability**: Multi-provider architecture
|
|
|
|
## Success Metrics
|
|
|
|
### Quality Metrics
|
|
- **Summary Accuracy**: User satisfaction feedback
|
|
- **Consistency**: Structured output compliance across models
|
|
- **Coverage**: Key points extraction rate
|
|
|
|
### Performance Metrics
|
|
- **Generation Time**: < 30 seconds for 30-minute videos
|
|
- **Cache Hit Rate**: > 70% for popular content
|
|
- **Cost Efficiency**: < $0.005 per summary average
|
|
|
|
### Technical Metrics
|
|
- **API Reliability**: > 99% successful requests
|
|
- **Error Recovery**: < 5% failed summaries
|
|
- **Export Success**: > 98% successful exports
|
|
|
|
---
|
|
|
|
**Epic Status**: Ready for Implementation
|
|
**Dependencies**: Epic 1 must be completed first
|
|
**Next Action**: Create Story 2.1 (Single AI Model Integration)
|
|
**Epic Owner**: Bob (Scrum Master)
|
|
**Last Updated**: 2025-01-25 |