# AGENTS.md - YouTube Summarizer Development Standards This document defines development workflows, standards, and best practices for the YouTube Summarizer project. It serves as a guide for both human developers and AI agents working on this codebase. ## 🚨 CRITICAL: Server Status Checking Protocol **MANDATORY**: Check server status before ANY testing or debugging: ```bash # 1. ALWAYS CHECK server status FIRST lsof -i :3002 | grep LISTEN # Check frontend (expected port) lsof -i :8000 | grep LISTEN # Check backend (expected port) # 2. If servers NOT running, RESTART them cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer ./scripts/restart-frontend.sh # After frontend changes ./scripts/restart-backend.sh # After backend changes ./scripts/restart-both.sh # After changes to both # 3. VERIFY restart was successful lsof -i :3002 | grep LISTEN # Should show node process lsof -i :8000 | grep LISTEN # Should show python process # 4. ONLY THEN proceed with testing ``` **Server Checking Rules**: - ✅ ALWAYS check server status before testing - ✅ ALWAYS restart servers after code changes - ✅ ALWAYS verify restart was successful - ❌ NEVER assume servers are running - ❌ NEVER test without confirming server status - ❌ NEVER debug "errors" without checking if server is running ## 🚨 CRITICAL: Documentation Preservation Rule **MANDATORY**: Preserve critical documentation sections: - ❌ **NEVER** remove critical sections from CLAUDE.md or AGENTS.md - ❌ **NEVER** delete server checking protocols or development standards - ❌ **NEVER** remove established workflows or troubleshooting guides - ❌ **NEVER** delete testing procedures or quality standards - ✅ **ONLY** remove sections when explicitly instructed by the user - ✅ **ALWAYS** preserve and enhance existing documentation ## 🚩 CRITICAL: Directory Awareness Protocol **MANDATORY BEFORE ANY COMMAND**: ALWAYS verify your current working directory before running any command. ```bash # ALWAYS run this first before ANY command pwd # Expected result for YouTube Summarizer: # /Users/enias/projects/my-ai-projects/apps/youtube-summarizer ``` #### Critical Directory Rules - **NEVER assume** you're in the correct directory - **ALWAYS verify** with `pwd` before running commands - **YouTube Summarizer development** requires being in `/Users/enias/projects/my-ai-projects/apps/youtube-summarizer` - **Backend server** (`python3 backend/main.py`) must be run from YouTube Summarizer root - **Frontend development** (`npm run dev`) must be run from YouTube Summarizer root - **Database operations** and migrations will fail if run from wrong directory #### YouTube Summarizer Directory Verification ```bash # ❌ WRONG - Running from main project or apps directory cd /Users/enias/projects/my-ai-projects python3 backend/main.py # Will fail - backend/ doesn't exist here cd /Users/enias/projects/my-ai-projects/apps python3 main.py # Will fail - no main.py in apps/ # ✅ CORRECT - Always navigate to YouTube Summarizer cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer pwd # Verify: /Users/enias/projects/my-ai-projects/apps/youtube-summarizer python3 backend/main.py # Backend server # OR python3 main.py # Alternative entry point ``` ## 🚀 Quick Start for Developers **All stories are created and ready for implementation!** 1. **Start Here**: [Developer Handoff Guide](docs/DEVELOPER_HANDOFF.md) 2. **Sprint Plan**: [Sprint Planning Document](docs/SPRINT_PLANNING.md) 3. **First Story**: [Story 1.2 - URL Validation](docs/stories/1.2.youtube-url-validation-parsing.md) **Total Implementation Time**: ~6 weeks (3 sprints) - Sprint 1: Epic 1 (Foundation) - Stories 1.2-1.4 - Sprint 2: Epic 2 Core - Stories 2.1-2.3 - Sprint 3: Epic 2 Advanced - Stories 2.4-2.5 ## Table of Contents 1. [Development Workflow](#1-development-workflow) 2. [Code Standards](#2-code-standards) 3. [Testing Requirements](#3-testing-requirements) 4. [Documentation Standards](#4-documentation-standards) 5. [Git Workflow](#5-git-workflow) 6. [API Design Standards](#6-api-design-standards) 7. [Database Operations](#7-database-operations) 8. [Performance Guidelines](#8-performance-guidelines) 9. [Security Protocols](#9-security-protocols) 10. [Deployment Process](#10-deployment-process) ## 🚨 CRITICAL: Documentation Update Rule **MANDATORY**: After completing significant coding work, automatically update ALL documentation: ### Documentation Update Protocol 1. **After Feature Implementation** → Update relevant documentation files: - **CLAUDE.md** - Development guidance and protocols - **AGENTS.md** (this file) - Development standards and workflows - **README.md** - User-facing features and setup instructions - **CHANGELOG.md** - Version history and changes - **FILE_STRUCTURE.md** - Directory structure and file organization ### When to Update Documentation - ✅ **After implementing new features** → Update all relevant docs - ✅ **After fixing significant bugs** → Update troubleshooting guides - ✅ **After changing architecture** → Update CLAUDE.md, AGENTS.md, FILE_STRUCTURE.md - ✅ **After adding new tools/scripts** → Update CLAUDE.md, AGENTS.md, README.md - ✅ **After configuration changes** → Update setup documentation - ✅ **At end of development sessions** → Comprehensive doc review ### Documentation Workflow Integration ```bash # After completing significant code changes: # 1. Test changes work ./scripts/restart-backend.sh # Test backend changes ./scripts/restart-frontend.sh # Test frontend changes (if needed) # 2. Update relevant documentation files # 3. Commit documentation with code changes git add CLAUDE.md AGENTS.md README.md CHANGELOG.md FILE_STRUCTURE.md git commit -m "feat: implement feature X with documentation updates" ``` ### Documentation Standards - **Format**: Use clear headings, code blocks, and examples - **Timeliness**: Update immediately after code changes - **Completeness**: Cover all user-facing and developer-facing changes - **Consistency**: Maintain same format across all documentation files ## 1. Development Workflow ### Story-Driven Development (BMad Method) All development follows the BMad Method epic and story workflow: **Current Development Status: READY FOR IMPLEMENTATION** - **Epic 1**: Foundation & Core YouTube Integration (Story 1.1 ✅ Complete, Stories 1.2-1.4 📋 Ready) - **Epic 2**: AI Summarization Engine (Stories 2.1-2.5 📋 All Created and Ready) - **Epic 3**: Enhanced User Experience (Future - Ready for story creation) **Developer Handoff Complete**: All Epic 1 & 2 stories created with comprehensive Dev Notes. - See [Developer Handoff Guide](docs/DEVELOPER_HANDOFF.md) for implementation start - See [Sprint Planning](docs/SPRINT_PLANNING.md) for 6-week development schedule #### Story-Based Implementation Process ```bash # 1. Start with Developer Handoff cat docs/DEVELOPER_HANDOFF.md # Complete implementation guide cat docs/SPRINT_PLANNING.md # Sprint breakdown # 2. Get Your Next Story (All stories ready!) # Sprint 1: Stories 1.2, 1.3, 1.4 # Sprint 2: Stories 2.1, 2.2, 2.3 # Sprint 3: Stories 2.4, 2.5 # 3. Review Story Implementation Requirements # Read: docs/stories/{story-number}.{name}.md # Example: docs/stories/1.2.youtube-url-validation-parsing.md # Study: Dev Notes section with complete code examples # Check: All tasks and subtasks with time estimates # 4. Implement Story # Option A: Use Development Agent /BMad:agents:dev # Follow story specifications exactly # Option B: Direct implementation # Use code examples from Dev Notes # Follow file structure specified in story # Implement tasks in order # 5. Test Implementation (Comprehensive Test Runner) ./run_tests.sh run-unit --fail-fast # Ultra-fast feedback (229 tests) ./run_tests.sh run-specific "test_{module}.py" # Test specific modules ./run_tests.sh run-integration # Integration & API tests ./run_tests.sh run-all --coverage # Full validation with coverage cd frontend && npm test # 6. Server Restart Protocol (CRITICAL FOR BACKEND CHANGES) # ALWAYS restart backend after modifying Python files: ./scripts/restart-backend.sh # After backend code changes ./scripts/restart-frontend.sh # After npm installs or config changes ./scripts/restart-both.sh # Full stack restart # Frontend HMR handles React changes automatically - no restart needed # 7. Update Story Progress # In story file, mark tasks complete: # - [x] **Task 1: Completed task** # Update story status: Draft → In Progress → Review → Done # 7. Move to Next Story # Check Sprint Planning for next priority # Repeat process with next story file ``` #### Alternative: Direct Development (Without BMad Agents) ```bash # 1. Read current story specification cat docs/stories/1.2.youtube-url-validation-parsing.md # 2. Follow Dev Notes and architecture references cat docs/architecture.md # Technical specifications cat docs/front-end-spec.md # UI requirements # 3. Implement systematically # Follow tasks/subtasks exactly as specified # Use provided code examples and patterns # 4. Test and validate (Test Runner System) ./run_tests.sh run-unit --fail-fast # Fast feedback during development ./run_tests.sh run-all --coverage # Complete validation before story completion cd frontend && npm test ``` ### Story Implementation Checklist (BMad Method) - [ ] **Review Story Requirements** - [ ] Read complete story file (`docs/stories/{epic}.{story}.{name}.md`) - [ ] Study Dev Notes section with architecture references - [ ] Understand all acceptance criteria - [ ] Review all tasks and subtasks - [ ] **Follow Architecture Specifications** - [ ] Reference `docs/architecture.md` for technical patterns - [ ] Use exact file locations specified in story - [ ] Follow error handling patterns from architecture - [ ] Implement according to database schema specifications - [ ] **Write Tests First (TDD)** - [ ] Create unit tests based on story testing requirements - [ ] Write integration tests for API endpoints - [ ] Add frontend component tests where specified - [ ] Ensure test coverage meets story requirements - [ ] **Implement Features Systematically** - [ ] Complete tasks in order specified in story - [ ] Follow code examples and patterns from Dev Notes - [ ] Use exact imports and dependencies specified - [ ] Implement error handling as architecturally defined - [ ] **Validate Implementation** - [ ] All acceptance criteria met - [ ] All tasks/subtasks completed - [ ] Full test suite passes - [ ] Integration testing successful - [ ] **Update Story Progress** - [ ] Mark tasks complete in story markdown file - [ ] Update story status from "Draft" to "Done" - [ ] Add completion notes to Dev Agent Record section - [ ] Update epic progress in `docs/prd/index.md` - [ ] **Commit Changes** - [ ] Use story-based commit message format - [ ] Reference story number in commit - [ ] Include brief implementation summary ## FILE LENGTH - Keep All Files Modular and Focused ### 300 Lines of Code Limit **CRITICAL RULE**: We must keep all files under 300 LOC. - **Current Status**: Many files in our codebase break this rule - **Requirement**: Files must be modular & single-purpose - **Enforcement**: Before adding any significant functionality, check file length - **Action Required**: Refactor any file approaching or exceeding 300 lines ```bash # Check file lengths across project find . -name "*.py" -not -path "*/venv*/*" -not -path "*/__pycache__/*" -exec wc -l {} + | awk '$1 > 300' find . -name "*.ts" -name "*.tsx" -not -path "*/node_modules/*" -exec wc -l {} + | awk '$1 > 300' ``` **Modularization Strategies**: - Extract utility functions into separate modules - Split large classes into focused, single-responsibility classes - Move constants and configuration to dedicated files - Separate concerns: logic, data models, API handlers - Use composition over inheritance to reduce file complexity **Examples of Files Needing Refactoring**: - Large service files → Split into focused service modules - Complex API routers → Extract handlers to separate modules - Monolithic components → Break into smaller, composable components - Combined model files → Separate by entity or domain ## READING FILES - Never Make Assumptions ### Always Read Files in Full Before Changes **CRITICAL RULE**: Always read the file in full, do not be lazy. - **Before making ANY code changes**: Start by finding & reading ALL relevant files - **Never make changes without reading the entire file**: Understand context, existing patterns, dependencies - **Read related files**: Check imports, dependencies, and related modules - **Understand existing architecture**: Follow established patterns and conventions ```bash # Investigation checklist before any code changes: # 1. Read the target file completely # 2. Read all imported modules # 3. Check related test files # 4. Review configuration files # 5. Understand data models and schemas ``` **File Reading Protocol**: 1. **Target File**: Read entire file to understand current implementation 2. **Dependencies**: Read all imported modules and their interfaces 3. **Tests**: Check existing test coverage and patterns 4. **Related Files**: Review files in same directory/module 5. **Configuration**: Check relevant config files and environment variables 6. **Documentation**: Read any related documentation or comments **Common Mistakes to Avoid**: - ❌ Making changes based on file names alone - ❌ Assuming function behavior without reading implementation - ❌ Not understanding existing error handling patterns - ❌ Missing important configuration or environment dependencies - ❌ Ignoring existing test patterns and coverage ## EGO - Engineering Humility and Best Practices ### Do Not Make Assumptions - Consider Multiple Approaches **CRITICAL MINDSET**: Do not make assumptions. Do not jump to conclusions. - **Reality Check**: You are just a Large Language Model, you are very limited - **Engineering Approach**: Always consider multiple different approaches, just like a senior engineer - **Validate Assumptions**: Test your understanding against the actual codebase - **Seek Understanding**: When unclear, read more files and investigate thoroughly **Senior Engineer Mindset**: ``` 1. **Multiple Solutions**: Always consider 2-3 different approaches 2. **Trade-off Analysis**: Evaluate pros/cons of each approach 3. **Existing Patterns**: Follow established codebase patterns 4. **Future Maintenance**: Consider long-term maintainability 5. **Performance Impact**: Consider resource and performance implications 6. **Testing Strategy**: Plan testing approach before implementation ``` **Before Implementation, Ask**: - What are 2-3 different ways to solve this? - What are the trade-offs of each approach? - How does this fit with existing architecture patterns? - What could break if this implementation is wrong? - How would a senior engineer approach this problem? - What edge cases am I not considering? **Decision Process**: 1. **Gather Information**: Read all relevant files and understand context 2. **Generate Options**: Consider multiple implementation approaches 3. **Evaluate Trade-offs**: Analyze pros/cons of each option 4. **Check Patterns**: Ensure consistency with existing codebase 5. **Plan Testing**: Design test strategy to validate approach 6. **Implement Incrementally**: Start small, verify, then expand **Remember Your Limitations**: - Cannot execute code to verify behavior - Cannot access external documentation beyond what's provided - Cannot make network requests or test integrations - Cannot guarantee code will work without testing - Limited understanding of complex business logic **Compensation Strategies**: - Read more files when uncertain - Follow established patterns rigorously - Provide multiple implementation options - Document assumptions and limitations - Suggest verification steps for humans - Request feedback on complex architectural decisions ## Class Library Integration and Usage ### AI Assistant Class Library Reference This project uses the shared AI Assistant Class Library (`/lib/`) which provides foundational components for AI applications. Always check the class library first before implementing common functionality. #### Core Library Components Used: **Service Framework** (`/lib/services/`): ```python from ai_assistant_lib import BaseService, BaseAIService, ServiceStatus # Backend services inherit from library base classes class VideoService(BaseService): async def _initialize_impl(self) -> None: # Service-specific initialization with lifecycle management pass class AnthropicSummarizer(BaseAIService): # Inherits retry logic, caching, rate limiting from library async def _make_prediction(self, request: AIRequest) -> AIResponse: pass ``` **Repository Pattern** (`/lib/data/repositories/`): ```python from ai_assistant_lib import BaseRepository, TimestampedModel # Database models use library base classes class Summary(TimestampedModel): # Automatic created_at, updated_at fields __tablename__ = 'summaries' class SummaryRepository(BaseRepository[Summary]): # Inherits CRUD operations, filtering, pagination async def find_by_video_id(self, video_id: str) -> Optional[Summary]: filters = {"video_id": video_id} results = await self.find_all(filters=filters, limit=1) return results[0] if results else None ``` **Error Handling** (`/lib/core/exceptions/`): ```python from ai_assistant_lib import ServiceError, RetryableError, ValidationError # Consistent error handling across the application try: result = await summarizer.generate_summary(transcript) except RetryableError: # Automatic retry handled by library pass except ValidationError as e: raise HTTPException(status_code=400, detail=str(e)) ``` **Async Utilities** (`/lib/utils/helpers/`): ```python from ai_assistant_lib import with_retry, with_cache, MemoryCache # Automatic retry for external API calls @with_retry(max_attempts=3) async def extract_youtube_transcript(video_id: str) -> str: # Implementation with automatic exponential backoff pass # Caching for expensive operations cache = MemoryCache(max_size=1000, default_ttl=3600) @with_cache(cache=cache, key_prefix="transcript") async def get_cached_transcript(video_id: str) -> str: # Expensive transcript extraction cached automatically pass ``` #### Project-Specific Usage Patterns: **Backend API Services** (`backend/services/`): - `summary_pipeline.py` - Uses `BaseService` for pipeline orchestration - `anthropic_summarizer.py` - Extends `BaseAIService` for AI integration - `cache_manager.py` - Uses library caching utilities - `video_service.py` - Implements service framework patterns **Data Layer** (`backend/models/`, `backend/core/`): - `summary.py` - Uses `TimestampedModel` from library - `user.py` - Inherits from library base models - `database_registry.py` - Extends library database patterns **API Layer** (`backend/api/`): - Exception handling uses library error hierarchy - Request/response models extend library schemas - Dependency injection follows library patterns #### Library Integration Checklist: Before implementing new functionality: - [ ] **Check Library First**: Review `/lib/` for existing solutions - [ ] **Follow Patterns**: Use established library patterns and base classes - [ ] **Extend, Don't Duplicate**: Extend library classes instead of creating from scratch - [ ] **Error Handling**: Use library exception hierarchy for consistency - [ ] **Testing**: Use library test utilities and patterns #### Common Integration Patterns: ```python # Service initialization with library framework async def create_service() -> VideoService: service = VideoService("video_processor") await service.initialize() # Lifecycle managed by BaseService return service # Repository operations with library patterns async def get_summary_data(video_id: str) -> Optional[Summary]: repo = SummaryRepository(session, Summary) return await repo.find_by_video_id(video_id) # AI service with library retry and caching summarizer = AnthropicSummarizer( api_key=settings.ANTHROPIC_API_KEY, cache_manager=cache_manager, # From library retry_config=RetryConfig(max_attempts=3) # From library ) ``` ## 2. Code Standards ### Python Style Guide ```python """ Module docstring describing purpose and usage """ from typing import List, Optional, Dict, Any import asyncio from datetime import datetime # Constants in UPPER_CASE DEFAULT_TIMEOUT = 30 MAX_RETRIES = 3 class YouTubeSummarizer: """ Class for summarizing YouTube videos. Attributes: model: AI model to use for summarization cache: Cache service instance """ def __init__(self, model: str = "openai"): """Initialize summarizer with specified model.""" self.model = model self.cache = CacheService() async def summarize( self, video_url: str, options: Optional[Dict[str, Any]] = None ) -> Dict[str, Any]: """ Summarize a YouTube video. Args: video_url: YouTube video URL options: Optional summarization parameters Returns: Dictionary containing summary and metadata Raises: YouTubeError: If video cannot be accessed AIServiceError: If summarization fails """ # Implementation here pass ``` ### Type Hints Always use type hints for better code quality: ```python from typing import Union, List, Optional, Dict, Any, Tuple from pydantic import BaseModel, HttpUrl async def process_video( url: HttpUrl, models: List[str], max_length: Optional[int] = None ) -> Tuple[str, Dict[str, Any]]: """Process video with type safety.""" pass ``` ### Async/Await Pattern Use async for all I/O operations: ```python async def fetch_transcript(video_id: str) -> str: """Fetch transcript asynchronously.""" async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() # Use asyncio.gather for parallel operations results = await asyncio.gather( fetch_transcript(id1), fetch_transcript(id2), fetch_transcript(id3) ) ``` ## 3. Testing Requirements ### Test Runner System The project includes a production-ready test runner system with **229 discovered unit tests** and intelligent test categorization. ```bash # Primary Testing Commands ./run_tests.sh run-unit --fail-fast # Ultra-fast feedback (0.2s discovery) ./run_tests.sh run-all --coverage # Complete test suite ./run_tests.sh run-integration # Integration & API tests cd frontend && npm test # Frontend tests ``` ### Test Coverage Requirements - Minimum 80% code coverage - 100% coverage for critical paths - All edge cases tested - Error conditions covered **📖 Complete Testing Guide**: See [TESTING-INSTRUCTIONS.md](TESTING-INSTRUCTIONS.md) for comprehensive testing standards, procedures, examples, and troubleshooting. ## 4. Documentation Standards ### Code Documentation Every module, class, and function must have docstrings: ```python """ Module: YouTube Transcript Extractor This module provides functionality to extract transcripts from YouTube videos using multiple fallback methods. Example: >>> extractor = TranscriptExtractor() >>> transcript = await extractor.extract("video_id") """ def extract_transcript( video_id: str, language: str = "en", include_auto_generated: bool = True ) -> List[Dict[str, Any]]: """ Extract transcript from YouTube video. This function attempts to extract transcripts using the following priority: 1. Manual captions in specified language 2. Auto-generated captions if allowed 3. Translated captions as fallback Args: video_id: YouTube video identifier language: ISO 639-1 language code (default: "en") include_auto_generated: Whether to use auto-generated captions Returns: List of transcript segments with text, start time, and duration Raises: TranscriptNotAvailable: If no transcript can be extracted Example: >>> transcript = extract_transcript("dQw4w9WgXcQ", "en") >>> print(transcript[0]) {"text": "Never gonna give you up", "start": 0.0, "duration": 3.5} """ pass ``` ### API Documentation Use FastAPI's automatic documentation features: ```python from fastapi import APIRouter, HTTPException, status from pydantic import BaseModel, Field router = APIRouter() class SummarizeRequest(BaseModel): """Request model for video summarization.""" url: str = Field( ..., description="YouTube video URL", example="https://youtube.com/watch?v=dQw4w9WgXcQ" ) model: str = Field( "auto", description="AI model to use (openai, anthropic, deepseek, auto)", example="openai" ) max_length: Optional[int] = Field( None, description="Maximum summary length in words", ge=50, le=5000 ) @router.post( "/summarize", response_model=SummarizeResponse, status_code=status.HTTP_200_OK, summary="Summarize YouTube Video", description="Submit a YouTube video URL for AI-powered summarization" ) async def summarize_video(request: SummarizeRequest): """ Summarize a YouTube video using AI. This endpoint accepts a YouTube URL and returns a job ID for tracking the summarization progress. Use the /summary/{job_id} endpoint to retrieve the completed summary. """ pass ``` ## 5. Git Workflow ### Branch Naming ```bash # Feature branches feature/task-2-youtube-extraction feature/task-3-ai-summarization # Bugfix branches bugfix/transcript-encoding-error bugfix/rate-limit-handling # Hotfix branches hotfix/critical-api-error ``` ### Commit Messages Follow conventional commits: ```bash # Format: (): # Examples: feat(youtube): add transcript extraction service fix(api): handle rate limiting correctly docs(readme): update installation instructions test(youtube): add edge case tests refactor(cache): optimize cache key generation perf(summarizer): implement parallel processing chore(deps): update requirements.txt ``` ### Pull Request Template ```markdown ## Task Reference - Task ID: #3 - Task Title: Develop AI Summary Generation Service ## Description Brief description of changes made ## Changes Made - [ ] Implemented YouTube transcript extraction - [ ] Added multi-model AI support - [ ] Created caching layer - [ ] Added comprehensive tests ## Testing - [ ] Unit tests pass - [ ] Integration tests pass - [ ] Manual testing completed - [ ] Coverage > 80% ## Documentation - [ ] Code documented - [ ] API docs updated - [ ] README updated if needed ## Screenshots (if applicable) [Add screenshots here] ``` ## 6. API Design Standards ### RESTful Principles ```python # Good API design GET /api/summaries # List all summaries GET /api/summaries/{id} # Get specific summary POST /api/summaries # Create new summary PUT /api/summaries/{id} # Update summary DELETE /api/summaries/{id} # Delete summary # Status codes 200 OK # Successful GET/PUT 201 Created # Successful POST 202 Accepted # Processing async request 204 No Content # Successful DELETE 400 Bad Request # Invalid input 401 Unauthorized # Missing/invalid auth 403 Forbidden # No permission 404 Not Found # Resource doesn't exist 429 Too Many Requests # Rate limited 500 Internal Error # Server error ``` ### Response Format ```python # Success response { "success": true, "data": { "id": "uuid", "video_id": "abc123", "summary": "...", "metadata": {} }, "timestamp": "2025-01-25T10:00:00Z" } # Error response { "success": false, "error": { "code": "TRANSCRIPT_NOT_AVAILABLE", "message": "Could not extract transcript from video", "details": "No captions available in requested language" }, "timestamp": "2025-01-25T10:00:00Z" } ``` ### Pagination ```python @router.get("/summaries") async def list_summaries( page: int = Query(1, ge=1), limit: int = Query(20, ge=1, le=100), sort: str = Query("created_at", regex="^(created_at|updated_at|title)$"), order: str = Query("desc", regex="^(asc|desc)$") ): """List summaries with pagination.""" return { "data": summaries, "pagination": { "page": page, "limit": limit, "total": total_count, "pages": math.ceil(total_count / limit) } } ``` ## 7. Database Operations ### SQLAlchemy Models ```python from sqlalchemy import Column, String, Text, DateTime, Float, JSON from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.dialects.postgresql import UUID import uuid Base = declarative_base() class Summary(Base): __tablename__ = "summaries" id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) video_id = Column(String(20), nullable=False, index=True) video_url = Column(Text, nullable=False) video_title = Column(Text) transcript = Column(Text) summary = Column(Text) key_points = Column(JSON) chapters = Column(JSON) model_used = Column(String(50)) processing_time = Column(Float) created_at = Column(DateTime, default=datetime.utcnow) updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) def to_dict(self): """Convert to dictionary for API responses.""" return { "id": str(self.id), "video_id": self.video_id, "video_title": self.video_title, "summary": self.summary, "key_points": self.key_points, "chapters": self.chapters, "model_used": self.model_used, "created_at": self.created_at.isoformat() } ``` ### Database Migrations Use Alembic for migrations: ```bash # Create new migration alembic revision --autogenerate -m "Add chapters column" # Apply migrations alembic upgrade head # Rollback alembic downgrade -1 ``` ### Query Optimization ```python from sqlalchemy import select, and_ from sqlalchemy.orm import selectinload # Efficient querying with joins async def get_summaries_with_metadata(session, user_id: str): stmt = ( select(Summary) .options(selectinload(Summary.metadata)) .where(Summary.user_id == user_id) .order_by(Summary.created_at.desc()) .limit(10) ) result = await session.execute(stmt) return result.scalars().all() ``` ## 8. Performance Guidelines ### Caching Strategy ```python from functools import lru_cache import redis import hashlib import json class CacheService: def __init__(self): self.redis = redis.Redis(decode_responses=True) self.ttl = 3600 # 1 hour default def get_key(self, prefix: str, **kwargs) -> str: """Generate cache key from parameters.""" data = json.dumps(kwargs, sort_keys=True) hash_digest = hashlib.md5(data.encode()).hexdigest() return f"{prefix}:{hash_digest}" async def get_or_set(self, key: str, func, ttl: int = None): """Get from cache or compute and set.""" # Try cache first cached = self.redis.get(key) if cached: return json.loads(cached) # Compute result result = await func() # Cache result self.redis.setex( key, ttl or self.ttl, json.dumps(result) ) return result ``` ### Async Processing ```python from celery import Celery from typing import Dict, Any celery_app = Celery('youtube_summarizer') @celery_app.task async def process_video_task(video_url: str, options: Dict[str, Any]): """Background task for video processing.""" try: # Extract transcript transcript = await extract_transcript(video_url) # Generate summary summary = await generate_summary(transcript, options) # Save to database await save_summary(video_url, summary) return {"status": "completed", "summary_id": summary.id} except Exception as e: return {"status": "failed", "error": str(e)} ``` ### Performance Monitoring ```python import time from functools import wraps import logging logger = logging.getLogger(__name__) def measure_performance(func): """Decorator to measure function performance.""" @wraps(func) async def wrapper(*args, **kwargs): start = time.perf_counter() try: result = await func(*args, **kwargs) elapsed = time.perf_counter() - start logger.info(f"{func.__name__} took {elapsed:.3f}s") return result except Exception as e: elapsed = time.perf_counter() - start logger.error(f"{func.__name__} failed after {elapsed:.3f}s: {e}") raise return wrapper ``` ## 9. Security Protocols ### Input Validation ```python from pydantic import BaseModel, validator, HttpUrl import re class VideoURLValidator(BaseModel): url: HttpUrl @validator('url') def validate_youtube_url(cls, v): youtube_regex = re.compile( r'(https?://)?(www\.)?(youtube\.com|youtu\.be)/.+' ) if not youtube_regex.match(str(v)): raise ValueError('Invalid YouTube URL') return v ``` ### API Key Management ```python from pydantic import BaseSettings class Settings(BaseSettings): """Application settings with validation.""" # API Keys (never hardcode!) openai_api_key: str anthropic_api_key: str youtube_api_key: Optional[str] = None # Security secret_key: str allowed_origins: List[str] = ["http://localhost:3000"] class Config: env_file = ".env" env_file_encoding = "utf-8" case_sensitive = False settings = Settings() ``` ### Rate Limiting ```python from fastapi import Request, HTTPException from fastapi_limiter import FastAPILimiter from fastapi_limiter.depends import RateLimiter import redis.asyncio as redis # Initialize rate limiter async def init_rate_limiter(): redis_client = redis.from_url("redis://localhost:6379", encoding="utf-8", decode_responses=True) await FastAPILimiter.init(redis_client) # Apply rate limiting @router.post("/summarize", dependencies=[Depends(RateLimiter(times=10, seconds=60))]) async def summarize_video(request: SummarizeRequest): """Rate limited to 10 requests per minute.""" pass ``` ## 10. Deployment Process ### Docker Configuration ```dockerfile # Dockerfile FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application COPY . . # Run application CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8082"] ``` ### Environment Management ```bash # .env.development DEBUG=true DATABASE_URL=sqlite:///./dev.db LOG_LEVEL=DEBUG # .env.production DEBUG=false DATABASE_URL=postgresql://user:pass@db:5432/youtube_summarizer LOG_LEVEL=INFO ``` ### Health Checks ```python @router.get("/health") async def health_check(): """Health check endpoint for monitoring.""" checks = { "api": "healthy", "database": await check_database(), "cache": await check_cache(), "ai_service": await check_ai_service() } all_healthy = all(v == "healthy" for v in checks.values()) return { "status": "healthy" if all_healthy else "degraded", "checks": checks, "timestamp": datetime.utcnow().isoformat() } ``` ### Monitoring ```python from prometheus_client import Counter, Histogram, generate_latest # Metrics request_count = Counter('youtube_requests_total', 'Total requests') request_duration = Histogram('youtube_request_duration_seconds', 'Request duration') summary_generation_time = Histogram('summary_generation_seconds', 'Summary generation time') @router.get("/metrics") async def metrics(): """Prometheus metrics endpoint.""" return Response(generate_latest(), media_type="text/plain") ``` ## Agent-Specific Instructions ### For AI Agents When working on this codebase: 1. **Always check Task Master first**: `task-master next` 2. **Follow TDD**: Write tests before implementation 3. **Use type hints**: All functions must have type annotations 4. **Document changes**: Update docstrings and comments 5. **Test thoroughly**: Run full test suite before marking complete 6. **Update task status**: Keep Task Master updated with progress ### Quality Checklist Before marking any task as complete: - [ ] All tests pass (`./run_tests.sh run-all`) - [ ] Code coverage > 80% (`./run_tests.sh run-all --coverage`) - [ ] Unit tests pass with fast feedback (`./run_tests.sh run-unit --fail-fast`) - [ ] Integration tests validated (`./run_tests.sh run-integration`) - [ ] Frontend tests pass (`cd frontend && npm test`) - [ ] No linting errors (`ruff check src/`) - [ ] Type checking passes (`mypy src/`) - [ ] Documentation updated - [ ] Task Master updated - [ ] Changes committed with proper message **📖 Testing Details**: See [TESTING-INSTRUCTIONS.md](TESTING-INSTRUCTIONS.md) for complete testing procedures and standards. ## Conclusion This guide ensures consistent, high-quality development across all contributors to the YouTube Summarizer project. Follow these standards to maintain code quality, performance, and security. --- *Last Updated: 2025-01-25* *Version: 1.0.0*