youtube-summarizer/docs/prd/epic-2-ai-summarization-eng...

11 KiB

Epic 2: AI Summarization Engine

Epic Overview

Goal: Implement the core AI-powered summarization functionality that transforms transcripts into valuable, concise summaries. This epic establishes the intelligence layer of the application with support for multiple AI providers and intelligent caching.

Priority: High - Core product functionality
Epic Dependencies: Epic 1 (Foundation & Core YouTube Integration)
Estimated Complexity: High (AI integration and optimization)

Epic Success Criteria

Upon completion of this epic, the YouTube Summarizer will:

  1. Intelligent Summary Generation

    • High-quality AI-generated summaries using OpenAI GPT-4o-mini
    • Structured output with overview, key points, and chapters
    • Cost-optimized processing (~$0.001-0.005 per summary)
  2. Multi-Model AI Support

    • Support for OpenAI, Anthropic, and DeepSeek models
    • Automatic failover between models
    • User model selection with cost transparency
  3. Performance Optimization

    • Intelligent caching system (24-hour TTL)
    • Background processing for long videos
    • Cost tracking and optimization
  4. Export Capabilities

    • Multiple export formats (Markdown, PDF, plain text)
    • Copy-to-clipboard functionality
    • Batch export support

Stories in Epic 2

Story 2.1: Single AI Model Integration

As a user
I want AI-generated summaries of video transcripts
So that I can quickly understand video content without watching

Acceptance Criteria

  1. Successfully integrates with OpenAI GPT-4o-mini API for summary generation
  2. Implements proper prompt engineering for consistent summary quality
  3. Handles token limits by chunking long transcripts intelligently at sentence boundaries
  4. Returns structured summary with overview, key points, and conclusion sections
  5. Includes error handling for API failures with user-friendly messages
  6. Tracks token usage and estimated cost per summary for monitoring

Status: Ready for story creation
Dependencies: Story 1.4 (Basic Web Interface)

Story 2.2: Summary Generation Pipeline

As a user
I want high-quality summaries that capture the essence of videos
So that I can trust the summaries for decision-making

Acceptance Criteria

  1. Pipeline processes transcript through cleaning and preprocessing steps
  2. Removes filler words, repeated phrases, and transcript artifacts
  3. Identifies and preserves important quotes and specific claims
  4. Generates hierarchical summary with main points and supporting details
  5. Summary length is proportional to video length (approximately 10% of transcript)
  6. Processing completes within 30 seconds for videos under 30 minutes

Status: Ready for story creation
Dependencies: Story 2.1 (Single AI Model Integration)

Story 2.3: Caching System Implementation

As a system operator
I want summaries cached to reduce costs and improve performance
So that the system remains economically viable

Acceptance Criteria

  1. Redis cache stores summaries with composite key (video_id + model + params)
  2. Cache TTL set to 24 hours with option to configure
  3. Cache hit returns summary in under 200ms
  4. Cache invalidation API endpoint for administrative use
  5. Implements cache warming for popular videos during low-traffic periods
  6. Dashboard displays cache hit rate and cost savings metrics

Status: Ready for story creation
Dependencies: Story 2.2 (Summary Generation Pipeline)

Story 2.4: Multi-Model Support

As a user
I want to choose between different AI models
So that I can balance cost, speed, and quality based on my needs

Acceptance Criteria

  1. Supports OpenAI, Anthropic Claude, and DeepSeek models
  2. Model selection dropdown appears when multiple models are configured
  3. Each model has optimized prompts for best performance
  4. Fallback chain activates when primary model fails
  5. Model performance metrics tracked for comparison
  6. Cost per summary displayed before generation

Status: Ready for story creation
Dependencies: Story 2.3 (Caching System Implementation)

Story 2.5: Export Functionality

As a user
I want to export summaries in various formats
So that I can integrate them into my workflow

Acceptance Criteria

  1. Export available in Markdown, PDF, and plain text formats
  2. Exported files include metadata (video title, URL, date, model used)
  3. Markdown export preserves formatting and structure
  4. PDF export is properly formatted with headers and sections
  5. Copy-to-clipboard works for entire summary or individual sections
  6. Batch export available for multiple summaries from history

Status: Ready for story creation
Dependencies: Story 2.4 (Multi-Model Support)

Technical Architecture Context

AI Integration Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend       │    │   AI Services   │
│                 │    │                 │    │                 │
│ • Model Select  │◄──►│ • AI Service    │◄──►│ • OpenAI API    │
│ • Progress UI   │    │ • Prompt Mgmt   │    │ • Anthropic API │
│ • Export UI     │    │ • Token Tracking│    │ • DeepSeek API  │
│                 │    │ • Cost Monitor  │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                      ┌─────────────────┐
                      │  Cache Layer    │
                      │                 │
                      │ • Memory Cache  │
                      │ • DB Cache      │
                      │ • Smart Keys    │
                      └─────────────────┘

Key Services for Epic 2

AI Service Architecture

class AIService:
    def __init__(self, provider: str, api_key: str):
        self.provider = provider
        self.client = self._get_client(provider, api_key)
    
    async def generate_summary(
        self, 
        transcript: str, 
        video_metadata: Dict[str, Any],
        options: Dict[str, Any] = None
    ) -> Dict[str, Any]:
        """Generate structured summary with cost tracking"""

Caching Strategy

def get_cache_key(video_id: str, model: str, options: dict) -> str:
    """Generate cache key: hash(video_id + model + options)"""
    key_data = f"{video_id}:{model}:{json.dumps(options, sort_keys=True)}"
    return hashlib.sha256(key_data.encode()).hexdigest()

Cost Optimization Strategy

Target Cost Structure

  • Primary Model: OpenAI GPT-4o-mini (~$0.001/1K tokens)
  • Typical Video Cost: $0.001-0.005 per 30-minute video
  • Caching Benefit: ~80% reduction for repeat requests
  • Monthly Budget: ~$0.10/month for hobby usage

Token Optimization Techniques

  1. Intelligent Chunking: Split long transcripts at sentence boundaries
  2. Prompt Optimization: Efficient prompts for consistent output
  3. Preprocessing: Remove transcript artifacts and filler words
  4. Fallback Strategy: Use cheaper models when primary fails

Non-Functional Requirements for Epic 2

Performance

  • Summary generation within 30 seconds for videos under 30 minutes
  • Cache hits return results in under 200ms
  • Background processing for videos over 1 hour

Cost Management

  • Token usage tracking with alerts
  • Cost estimation before processing
  • Monthly budget monitoring and warnings

Quality Assurance

  • Consistent summary structure across all models
  • Quality metrics tracking (summary length, key points extraction)
  • A/B testing capability for prompt optimization

Reliability

  • Multi-model fallback chain
  • Retry logic with exponential backoff
  • Graceful degradation when AI services unavailable

Definition of Done for Epic 2

  • All 5 stories completed and validated
  • User can generate AI summaries from video transcripts
  • Multiple AI models supported with fallback
  • Caching system operational with cost savings visible
  • Export functionality working for all formats
  • Cost tracking under $0.10/month target for typical usage
  • Performance targets met (30s generation, 200ms cache)
  • Error handling graceful for all AI service failures

API Endpoints Introduced in Epic 2

POST /api/summarize

interface SummarizeRequest {
  url: string;
  model?: "openai" | "anthropic" | "deepseek";
  options?: {
    length?: "brief" | "standard" | "detailed";
    focus?: string;
  };
}

GET /api/summary/{id}

interface SummaryResponse {
  id: string;
  video: VideoMetadata;
  summary: {
    text: string;
    key_points: string[];
    chapters: Chapter[];
    model_used: string;
  };
  metadata: {
    processing_time: number;
    token_count: number;
    cost_estimate: number;
  };
}

POST /api/export/{id}

interface ExportRequest {
  format: "markdown" | "pdf" | "txt";
  options?: ExportOptions;
}

Risks and Mitigation

AI Service Risks

  1. API Rate Limits: Multi-model fallback and intelligent queuing
  2. Cost Overruns: Usage monitoring and budget alerts
  3. Quality Degradation: A/B testing and quality metrics

Technical Risks

  1. Token Limit Exceeded: Intelligent chunking and preprocessing
  2. Cache Invalidation: Smart cache key generation and TTL management
  3. Export Failures: Robust file generation with error recovery

Business Risks

  1. User Experience: Background processing and progress indicators
  2. Cost Scaling: Caching strategy and cost optimization
  3. Model Availability: Multi-provider architecture

Success Metrics

Quality Metrics

  • Summary Accuracy: User satisfaction feedback
  • Consistency: Structured output compliance across models
  • Coverage: Key points extraction rate

Performance Metrics

  • Generation Time: < 30 seconds for 30-minute videos
  • Cache Hit Rate: > 70% for popular content
  • Cost Efficiency: < $0.005 per summary average

Technical Metrics

  • API Reliability: > 99% successful requests
  • Error Recovery: < 5% failed summaries
  • Export Success: > 98% successful exports

Epic Status: Ready for Implementation
Dependencies: Epic 1 must be completed first
Next Action: Create Story 2.1 (Single AI Model Integration)
Epic Owner: Bob (Scrum Master)
Last Updated: 2025-01-25