# YouTube Summarizer Product Requirements Document (PRD) ## Goals and Background Context ### Goals - Enable users to obtain concise, accurate AI-generated summaries of YouTube videos within 30 seconds - Reduce time spent consuming long-form video content by 80% while retaining key information - Support multiple AI models to ensure high availability and cost optimization - Create a sustainable, cache-optimized architecture that minimizes API costs below $100/month - Provide seamless export functionality for integration with existing knowledge management workflows - Build a responsive web application accessible across all devices and platforms - Establish a foundation for future features including batch processing and collaborative features ### Background Context The exponential growth of video content has created an information overload challenge for students, professionals, and content creators. With millions of hours of educational and informational content uploaded daily to YouTube, users struggle to efficiently extract value from long-form videos. Current solutions either require manual note-taking or provide inadequate summaries that miss critical insights. This YouTube Summarizer addresses this gap by leveraging state-of-the-art AI models to provide intelligent, context-aware summaries that preserve the essence of video content while dramatically reducing consumption time. By supporting multiple AI providers and implementing intelligent caching, the solution ensures both reliability and cost-effectiveness for users ranging from individual learners to professional research teams. ### Change Log | Date | Version | Description | Author | |------|---------|-------------|--------| | 2025-01-25 | 1.0 | Initial PRD creation | System | | 2025-01-25 | 2.0 | Refined to BMad standards with FR/NFR format | System | ## Requirements ### Functional Requirements - **FR1**: System shall accept YouTube URLs in all standard formats (youtube.com/watch, youtu.be, embed URLs) and validate them before processing - **FR2**: System shall extract video metadata including title, duration, channel, and publication date using YouTube APIs - **FR3**: System shall retrieve transcripts using YouTube Transcript API as primary method with fallback to auto-generated captions - **FR4**: System shall generate AI-powered summaries using at least one configured model (OpenAI, Anthropic, or DeepSeek) - **FR5**: System shall display summaries with extracted key points, main topics, and actionable insights - **FR6**: System shall provide one-click copy-to-clipboard functionality for all summary sections - **FR7**: System shall cache summaries for 24 hours using video ID and model parameters as cache key - **FR8**: System shall allow users to select AI model when multiple models are configured - **FR9**: System shall provide summary customization options for length (brief/standard/detailed) and focus area - **FR10**: System shall generate timestamped chapters based on content structure and topic changes - **FR11**: System shall export summaries in Markdown, PDF, and plain text formats - **FR12**: System shall maintain summary history for the session with retrieval capability - **FR13**: System shall implement rate limiting at 30 requests per minute per IP address - **FR14**: System shall support batch processing of multiple video URLs with queue management - **FR15**: System shall provide real-time progress updates during summary generation via WebSocket - **FR16**: System shall expose RESTful API endpoints for programmatic access to summarization features - **FR17**: System shall handle videos up to 3 hours in duration with automatic transcript chunking - **FR18**: System shall detect video language and provide summaries in the same language when possible - **FR19**: System shall implement automatic retry with exponential backoff for transient failures - **FR20**: System shall provide detailed error messages with actionable recovery suggestions ### Non-Functional Requirements - **NFR1**: System shall generate summaries within 30 seconds for videos under 30 minutes in length - **NFR2**: System shall support 100 concurrent users without performance degradation - **NFR3**: System shall maintain 99% uptime availability excluding planned maintenance - **NFR4**: System shall return cached content within 200ms response time - **NFR5**: System shall optimize token usage to keep AI API costs under $100/month for 10,000 summaries - **NFR6**: System shall implement secure storage of API keys using environment variables and secrets management - **NFR7**: System shall sanitize all user inputs to prevent XSS and injection attacks - **NFR8**: System shall implement CORS policies restricting access to authorized domains - **NFR9**: System shall comply with WCAG 2.1 Level AA accessibility standards - **NFR10**: System shall provide responsive design supporting viewport widths from 320px to 4K displays - **NFR11**: System shall log all errors with correlation IDs for debugging and monitoring - **NFR12**: System shall implement database connection pooling with maximum 20 connections - **NFR13**: System shall use PostgreSQL for production and SQLite for development environments - **NFR14**: System shall implement comprehensive test coverage with minimum 80% code coverage - **NFR15**: System shall respect YouTube Terms of Service and API quotas with appropriate throttling ## User Interface Design Goals ### Overall UX Vision Create a minimalist, distraction-free interface that prioritizes content clarity and rapid information retrieval. The design should feel instantly familiar to users of modern web applications while providing powerful features through progressive disclosure. Every interaction should feel fast, responsive, and purposeful. ### Key Interaction Paradigms - **Single Input Focus**: URL input field as the primary call-to-action on landing - **Progressive Disclosure**: Advanced options hidden until needed - **Real-time Feedback**: Immediate validation and progress indicators - **Keyboard Navigation**: Full keyboard accessibility for power users - **Mobile-First Responsive**: Touch-optimized with swipe gestures on mobile - **Dark/Light Mode**: Automatic theme based on system preference with manual override ### Core Screens and Views - **Landing Page**: Hero section with URL input, recent summaries carousel - **Processing View**: Real-time progress with transcript preview as it loads - **Summary Display**: Multi-section layout with key points, full summary, chapters - **Export Modal**: Format selection with preview before download - **History Sidebar**: Searchable list of recent summaries with filters - **Settings Panel**: API configuration, model selection, preferences - **Error State**: Clear error message with troubleshooting steps - **Empty State**: Helpful onboarding when no summaries exist ### Accessibility: WCAG AA - Full keyboard navigation with visible focus indicators - Screen reader optimized with ARIA labels and landmarks - Minimum 4.5:1 color contrast ratios - Resizable text up to 200% without horizontal scrolling - Alternative text for all visual elements ### Branding - Clean, modern aesthetic with generous whitespace - Primary color: Electric blue (#0066FF) for CTAs - Typography: System fonts for fast loading (SF Pro, Segoe UI, Roboto) - Subtle animations for state transitions (200ms ease-out) - Consistent 8px grid system for spacing ### Target Device and Platforms: Web Responsive - Progressive Web App capable - Optimized for Chrome, Safari, Firefox, Edge (latest 2 versions) - Responsive breakpoints: Mobile (320-768px), Tablet (768-1024px), Desktop (1024px+) - Touch-optimized with appropriate tap targets (minimum 44x44px) ## Technical Assumptions ### Repository Structure: Monorepo Utilizing a monorepo structure to maintain all application components in a single repository, simplifying dependency management and enabling atomic commits across the full stack. This approach facilitates easier CI/CD setup and consistent tooling across the project. ### Service Architecture: Modular Monolith Building as a modular monolith using FastAPI to start, with clear service boundaries that can be extracted to microservices if scaling demands. This provides the simplicity of monolithic deployment while maintaining the flexibility to evolve the architecture as needs grow. ### Testing Requirements: Full Testing Pyramid - **Unit Tests**: Minimum 80% coverage for all business logic - **Integration Tests**: API endpoint testing with mocked external services - **E2E Tests**: Critical user flows using Playwright - **Performance Tests**: Load testing for concurrent user scenarios - **Manual Testing Conveniences**: Swagger UI for API exploration ### Additional Technical Assumptions and Requests - **Python 3.11+** as primary backend language for AI library compatibility - **FastAPI** for high-performance async API development - **PostgreSQL** for production data persistence with JSON support - **Redis** for caching layer and session management - **SQLAlchemy 2.0** with async support for ORM - **Pydantic V2** for data validation and settings management - **React 18** with TypeScript for type-safe frontend development - **Tailwind CSS** for utility-first styling approach - **Docker** for containerization and consistent environments - **GitHub Actions** for CI/CD pipeline automation - **Sentry** for error tracking and performance monitoring - **Prometheus + Grafana** for metrics and observability ## Epic List ### Epic 1: Foundation & Core YouTube Integration Establish project infrastructure, implement YouTube URL processing, transcript extraction, and create the basic web interface for URL input and display. ### Epic 2: AI Summarization Engine Build the core AI integration with initial model support, implement intelligent summary generation, caching system, and multi-model capability with export functionality. ### Epic 3: Enhanced User Experience Add user authentication, summary history management, batch processing capabilities, real-time updates, and public API endpoints for third-party integration. ## Epic 1: Foundation & Core YouTube Integration **Goal**: Establish the foundational infrastructure and core YouTube integration capabilities that all subsequent features will build upon. This epic delivers a functional system that can accept YouTube URLs, extract transcripts, and display them through a basic but polished web interface. ### Story 1.1: Project Setup and Infrastructure **As a** developer **I want** a fully configured project with all necessary dependencies and development tooling **So that** the team can begin development with consistent environments and automated quality checks #### Acceptance Criteria 1. FastAPI application structure created with proper package organization (api/, services/, models/, utils/) 2. Development environment configured with hot-reload, debugging, and environment variable management 3. Docker configuration enables single-command local development startup 4. Pre-commit hooks enforce code formatting (Black), linting (Ruff), and type checking (mypy) 5. GitHub Actions workflow runs tests and quality checks on every push 6. README includes clear setup instructions and architecture overview ### Story 1.2: YouTube URL Validation and Parsing **As a** user **I want** the system to accept any valid YouTube URL format **So that** I can paste URLs directly from my browser without modification #### Acceptance Criteria 1. System correctly parses video IDs from youtube.com/watch?v=, youtu.be/, and embed URL formats 2. Invalid URLs return clear error messages specifying the expected format 3. System extracts and validates video IDs are exactly 11 characters 4. Playlist URLs are detected and user is informed they're not yet supported 5. URL validation happens client-side for instant feedback and server-side for security ### Story 1.3: Transcript Extraction Service **As a** user **I want** the system to automatically retrieve video transcripts **So that** I can get summaries without manual transcription #### Acceptance Criteria 1. Successfully retrieves transcripts using youtube-transcript-api for videos with captions 2. Falls back to auto-generated captions when manual captions unavailable 3. Returns clear error message for videos without any captions 4. Extracts metadata including video title, duration, channel name, and publish date 5. Handles multiple languages with preference for English when available 6. Implements retry logic with exponential backoff for transient API failures ### Story 1.4: Basic Web Interface **As a** user **I want** a clean web interface to input URLs and view transcripts **So that** I can interact with the system through my browser #### Acceptance Criteria 1. Landing page displays prominent URL input field with placeholder text 2. Submit button is disabled until valid URL is entered 3. Loading spinner appears during transcript extraction with elapsed time counter 4. Extracted transcript displays in scrollable, readable format with timestamps 5. Error messages appear inline with suggestions for resolution 6. Interface is responsive and works on mobile devices (320px minimum width) ## Epic 2: AI Summarization Engine **Goal**: Implement the core AI-powered summarization functionality that transforms transcripts into valuable, concise summaries. This epic establishes the intelligence layer of the application with support for multiple AI providers and intelligent caching. ### Story 2.1: Single AI Model Integration **As a** user **I want** AI-generated summaries of video transcripts **So that** I can quickly understand video content without watching #### Acceptance Criteria 1. Successfully integrates with OpenAI GPT-4o-mini API for summary generation 2. Implements proper prompt engineering for consistent summary quality 3. Handles token limits by chunking long transcripts intelligently at sentence boundaries 4. Returns structured summary with overview, key points, and conclusion sections 5. Includes error handling for API failures with user-friendly messages 6. Tracks token usage and estimated cost per summary for monitoring ### Story 2.2: Summary Generation Pipeline **As a** user **I want** high-quality summaries that capture the essence of videos **So that** I can trust the summaries for decision-making #### Acceptance Criteria 1. Pipeline processes transcript through cleaning and preprocessing steps 2. Removes filler words, repeated phrases, and transcript artifacts 3. Identifies and preserves important quotes and specific claims 4. Generates hierarchical summary with main points and supporting details 5. Summary length is proportional to video length (approximately 10% of transcript) 6. Processing completes within 30 seconds for videos under 30 minutes ### Story 2.3: Caching System Implementation **As a** system operator **I want** summaries cached to reduce costs and improve performance **So that** the system remains economically viable #### Acceptance Criteria 1. Redis cache stores summaries with composite key (video_id + model + params) 2. Cache TTL set to 24 hours with option to configure 3. Cache hit returns summary in under 200ms 4. Cache invalidation API endpoint for administrative use 5. Implements cache warming for popular videos during low-traffic periods 6. Dashboard displays cache hit rate and cost savings metrics ### Story 2.4: Multi-Model Support **As a** user **I want** to choose between different AI models **So that** I can balance cost, speed, and quality based on my needs #### Acceptance Criteria 1. Supports OpenAI, Anthropic Claude, and DeepSeek models 2. Model selection dropdown appears when multiple models are configured 3. Each model has optimized prompts for best performance 4. Fallback chain activates when primary model fails 5. Model performance metrics tracked for comparison 6. Cost per summary displayed before generation ### Story 2.5: Export Functionality **As a** user **I want** to export summaries in various formats **So that** I can integrate them into my workflow #### Acceptance Criteria 1. Export available in Markdown, PDF, and plain text formats 2. Exported files include metadata (video title, URL, date, model used) 3. Markdown export preserves formatting and structure 4. PDF export is properly formatted with headers and sections 5. Copy-to-clipboard works for entire summary or individual sections 6. Batch export available for multiple summaries from history ## Epic 3: Enhanced User Experience **Goal**: Transform the application from a simple tool to a comprehensive platform with user accounts, advanced features, and API access. This epic enables power users and developers to integrate the summarizer into their workflows. ### Story 3.1: User Authentication System **As a** user **I want** to create an account and login **So that** I can access my summary history across devices #### Acceptance Criteria 1. Email/password registration with verification email 2. Secure password requirements enforced (minimum 8 characters, complexity rules) 3. JWT-based authentication with refresh tokens 4. Password reset functionality via email 5. Optional OAuth integration with Google for single sign-on 6. Session management with automatic logout after inactivity ### Story 3.2: Summary History Management **As a** authenticated user **I want** to view and manage my summary history **So that** I can reference previous summaries #### Acceptance Criteria 1. Summary history displays in reverse chronological order 2. Search functionality filters by video title, content, or date range 3. Summaries can be starred for quick access 4. Bulk delete operations with confirmation dialog 5. Summary sharing via unique URL (public or private) 6. Export entire history as JSON or CSV ### Story 3.3: Batch Processing **As a** power user **I want** to summarize multiple videos at once **So that** I can process entire playlists or video series efficiently #### Acceptance Criteria 1. Accepts multiple URLs via textarea (one per line) or file upload 2. Queue system processes videos sequentially with progress indicator 3. Partial results available as each video completes 4. Failed videos don't block subsequent processing 5. Batch results downloadable as ZIP with all formats 6. Email notification when batch processing completes ### Story 3.4: Real-time Updates **As a** user **I want** live progress updates during processing **So that** I know the system is working and how long to wait #### Acceptance Criteria 1. WebSocket connection provides real-time status updates 2. Progress stages shown: Validating → Extracting → Summarizing → Complete 3. Percentage complete based on transcript chunks processed 4. Estimated time remaining calculated from similar videos 5. Cancel button allows aborting long-running operations 6. Connection loss handled gracefully with automatic reconnection ### Story 3.5: API Endpoints **As a** developer **I want** RESTful API access to summarization features **So that** I can integrate the service into my applications #### Acceptance Criteria 1. API key generation and management in user settings 2. RESTful endpoints follow OpenAPI 3.0 specification 3. Rate limiting enforced per API key (100 requests/hour default) 4. Comprehensive API documentation with examples 5. SDKs provided for Python and JavaScript 6. Webhook support for async processing notifications ## Checklist Results Report *To be completed after PM checklist execution* ## Next Steps ### UX Expert Prompt Create a comprehensive front-end specification for the YouTube Summarizer web application based on this PRD. Focus on designing an intuitive, accessible interface that makes video summarization effortless for users ranging from students to professionals. Consider mobile-first responsive design, progressive disclosure of advanced features, and clear visual feedback during processing states. Emphasize speed and simplicity in the core workflow while providing power features for advanced users. ### Architect Prompt Design the technical architecture for the YouTube Summarizer application based on this PRD. Create a scalable, maintainable system using FastAPI, PostgreSQL, and Redis, with clear separation of concerns and well-defined service boundaries. Address critical concerns including: transcript extraction reliability with multiple fallback methods, AI model integration with provider abstraction, caching strategy for cost optimization, and concurrent request handling. Ensure the architecture supports future migration to microservices if needed. --- *End of Product Requirements Document v2.0*