# AGENTS.md - YouTube Summarizer Development Standards This document defines development workflows, standards, and best practices for the YouTube Summarizer project. It serves as a guide for both human developers and AI agents working on this codebase. ## 🚀 Quick Start for Developers **All stories are created and ready for implementation!** 1. **Start Here**: [Developer Handoff Guide](docs/DEVELOPER_HANDOFF.md) 2. **Sprint Plan**: [Sprint Planning Document](docs/SPRINT_PLANNING.md) 3. **First Story**: [Story 1.2 - URL Validation](docs/stories/1.2.youtube-url-validation-parsing.md) **Total Implementation Time**: ~6 weeks (3 sprints) - Sprint 1: Epic 1 (Foundation) - Stories 1.2-1.4 - Sprint 2: Epic 2 Core - Stories 2.1-2.3 - Sprint 3: Epic 2 Advanced - Stories 2.4-2.5 ## Table of Contents 1. [Development Workflow](#1-development-workflow) 2. [Code Standards](#2-code-standards) 3. [Testing Requirements](#3-testing-requirements) 4. [Documentation Standards](#4-documentation-standards) 5. [Git Workflow](#5-git-workflow) 6. [API Design Standards](#6-api-design-standards) 7. [Database Operations](#7-database-operations) 8. [Performance Guidelines](#8-performance-guidelines) 9. [Security Protocols](#9-security-protocols) 10. [Deployment Process](#10-deployment-process) ## 1. Development Workflow ### Story-Driven Development (BMad Method) All development follows the BMad Method epic and story workflow: **Current Development Status: READY FOR IMPLEMENTATION** - **Epic 1**: Foundation & Core YouTube Integration (Story 1.1 ✅ Complete, Stories 1.2-1.4 📋 Ready) - **Epic 2**: AI Summarization Engine (Stories 2.1-2.5 📋 All Created and Ready) - **Epic 3**: Enhanced User Experience (Future - Ready for story creation) **Developer Handoff Complete**: All Epic 1 & 2 stories created with comprehensive Dev Notes. - See [Developer Handoff Guide](docs/DEVELOPER_HANDOFF.md) for implementation start - See [Sprint Planning](docs/SPRINT_PLANNING.md) for 6-week development schedule #### Story-Based Implementation Process ```bash # 1. Start with Developer Handoff cat docs/DEVELOPER_HANDOFF.md # Complete implementation guide cat docs/SPRINT_PLANNING.md # Sprint breakdown # 2. Get Your Next Story (All stories ready!) # Sprint 1: Stories 1.2, 1.3, 1.4 # Sprint 2: Stories 2.1, 2.2, 2.3 # Sprint 3: Stories 2.4, 2.5 # 3. Review Story Implementation Requirements # Read: docs/stories/{story-number}.{name}.md # Example: docs/stories/1.2.youtube-url-validation-parsing.md # Study: Dev Notes section with complete code examples # Check: All tasks and subtasks with time estimates # 4. Implement Story # Option A: Use Development Agent /BMad:agents:dev # Follow story specifications exactly # Option B: Direct implementation # Use code examples from Dev Notes # Follow file structure specified in story # Implement tasks in order # 5. Test Implementation pytest backend/tests/unit/test_{module}.py pytest backend/tests/integration/ cd frontend && npm test # 6. Update Story Progress # In story file, mark tasks complete: # - [x] **Task 1: Completed task** # Update story status: Draft → In Progress → Review → Done # 7. Move to Next Story # Check Sprint Planning for next priority # Repeat process with next story file ``` #### Alternative: Direct Development (Without BMad Agents) ```bash # 1. Read current story specification cat docs/stories/1.2.youtube-url-validation-parsing.md # 2. Follow Dev Notes and architecture references cat docs/architecture.md # Technical specifications cat docs/front-end-spec.md # UI requirements # 3. Implement systematically # Follow tasks/subtasks exactly as specified # Use provided code examples and patterns # 4. Test and validate pytest backend/tests/ -v cd frontend && npm test ``` ### Story Implementation Checklist (BMad Method) - [ ] **Review Story Requirements** - [ ] Read complete story file (`docs/stories/{epic}.{story}.{name}.md`) - [ ] Study Dev Notes section with architecture references - [ ] Understand all acceptance criteria - [ ] Review all tasks and subtasks - [ ] **Follow Architecture Specifications** - [ ] Reference `docs/architecture.md` for technical patterns - [ ] Use exact file locations specified in story - [ ] Follow error handling patterns from architecture - [ ] Implement according to database schema specifications - [ ] **Write Tests First (TDD)** - [ ] Create unit tests based on story testing requirements - [ ] Write integration tests for API endpoints - [ ] Add frontend component tests where specified - [ ] Ensure test coverage meets story requirements - [ ] **Implement Features Systematically** - [ ] Complete tasks in order specified in story - [ ] Follow code examples and patterns from Dev Notes - [ ] Use exact imports and dependencies specified - [ ] Implement error handling as architecturally defined - [ ] **Validate Implementation** - [ ] All acceptance criteria met - [ ] All tasks/subtasks completed - [ ] Full test suite passes - [ ] Integration testing successful - [ ] **Update Story Progress** - [ ] Mark tasks complete in story markdown file - [ ] Update story status from "Draft" to "Done" - [ ] Add completion notes to Dev Agent Record section - [ ] Update epic progress in `docs/prd/index.md` - [ ] **Commit Changes** - [ ] Use story-based commit message format - [ ] Reference story number in commit - [ ] Include brief implementation summary ## 2. Code Standards ### Python Style Guide ```python """ Module docstring describing purpose and usage """ from typing import List, Optional, Dict, Any import asyncio from datetime import datetime # Constants in UPPER_CASE DEFAULT_TIMEOUT = 30 MAX_RETRIES = 3 class YouTubeSummarizer: """ Class for summarizing YouTube videos. Attributes: model: AI model to use for summarization cache: Cache service instance """ def __init__(self, model: str = "openai"): """Initialize summarizer with specified model.""" self.model = model self.cache = CacheService() async def summarize( self, video_url: str, options: Optional[Dict[str, Any]] = None ) -> Dict[str, Any]: """ Summarize a YouTube video. Args: video_url: YouTube video URL options: Optional summarization parameters Returns: Dictionary containing summary and metadata Raises: YouTubeError: If video cannot be accessed AIServiceError: If summarization fails """ # Implementation here pass ``` ### Type Hints Always use type hints for better code quality: ```python from typing import Union, List, Optional, Dict, Any, Tuple from pydantic import BaseModel, HttpUrl async def process_video( url: HttpUrl, models: List[str], max_length: Optional[int] = None ) -> Tuple[str, Dict[str, Any]]: """Process video with type safety.""" pass ``` ### Async/Await Pattern Use async for all I/O operations: ```python async def fetch_transcript(video_id: str) -> str: """Fetch transcript asynchronously.""" async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() # Use asyncio.gather for parallel operations results = await asyncio.gather( fetch_transcript(id1), fetch_transcript(id2), fetch_transcript(id3) ) ``` ## 3. Testing Requirements ### Test Structure ``` tests/ ├── unit/ │ ├── test_youtube_service.py │ ├── test_summarizer_service.py │ └── test_cache_service.py ├── integration/ │ ├── test_api_endpoints.py │ └── test_database.py ├── fixtures/ │ ├── sample_transcripts.json │ └── mock_responses.py └── conftest.py ``` ### Unit Test Example ```python # tests/unit/test_youtube_service.py import pytest from unittest.mock import Mock, patch, AsyncMock from src.services.youtube import YouTubeService class TestYouTubeService: @pytest.fixture def youtube_service(self): return YouTubeService() @pytest.fixture def mock_transcript(self): return [ {"text": "Hello world", "start": 0.0, "duration": 2.0}, {"text": "This is a test", "start": 2.0, "duration": 3.0} ] @pytest.mark.asyncio async def test_extract_transcript_success( self, youtube_service, mock_transcript ): with patch('youtube_transcript_api.YouTubeTranscriptApi.get_transcript') as mock_get: mock_get.return_value = mock_transcript result = await youtube_service.extract_transcript("test_id") assert result == mock_transcript mock_get.assert_called_once_with("test_id") def test_extract_video_id_various_formats(self, youtube_service): test_cases = [ ("https://www.youtube.com/watch?v=abc123", "abc123"), ("https://youtu.be/xyz789", "xyz789"), ("https://youtube.com/embed/qwe456", "qwe456"), ("https://www.youtube.com/watch?v=test&t=123", "test") ] for url, expected_id in test_cases: assert youtube_service.extract_video_id(url) == expected_id ``` ### Integration Test Example ```python # tests/integration/test_api_endpoints.py import pytest from fastapi.testclient import TestClient from src.main import app @pytest.fixture def client(): return TestClient(app) class TestSummarizationAPI: @pytest.mark.asyncio async def test_summarize_endpoint(self, client): response = client.post("/api/summarize", json={ "url": "https://youtube.com/watch?v=test123", "model": "openai", "options": {"max_length": 500} }) assert response.status_code == 200 data = response.json() assert "job_id" in data assert data["status"] == "processing" @pytest.mark.asyncio async def test_get_summary(self, client): # First create a summary create_response = client.post("/api/summarize", json={ "url": "https://youtube.com/watch?v=test123" }) job_id = create_response.json()["job_id"] # Then retrieve it get_response = client.get(f"/api/summary/{job_id}") assert get_response.status_code in [200, 202] # 202 if still processing ``` ### Test Coverage Requirements - Minimum 80% code coverage - 100% coverage for critical paths - All edge cases tested - Error conditions covered ```bash # Run tests with coverage pytest tests/ --cov=src --cov-report=html --cov-report=term # Coverage report should show: # src/services/youtube.py 95% # src/services/summarizer.py 88% # src/api/routes.py 92% ``` ## 4. Documentation Standards ### Code Documentation Every module, class, and function must have docstrings: ```python """ Module: YouTube Transcript Extractor This module provides functionality to extract transcripts from YouTube videos using multiple fallback methods. Example: >>> extractor = TranscriptExtractor() >>> transcript = await extractor.extract("video_id") """ def extract_transcript( video_id: str, language: str = "en", include_auto_generated: bool = True ) -> List[Dict[str, Any]]: """ Extract transcript from YouTube video. This function attempts to extract transcripts using the following priority: 1. Manual captions in specified language 2. Auto-generated captions if allowed 3. Translated captions as fallback Args: video_id: YouTube video identifier language: ISO 639-1 language code (default: "en") include_auto_generated: Whether to use auto-generated captions Returns: List of transcript segments with text, start time, and duration Raises: TranscriptNotAvailable: If no transcript can be extracted Example: >>> transcript = extract_transcript("dQw4w9WgXcQ", "en") >>> print(transcript[0]) {"text": "Never gonna give you up", "start": 0.0, "duration": 3.5} """ pass ``` ### API Documentation Use FastAPI's automatic documentation features: ```python from fastapi import APIRouter, HTTPException, status from pydantic import BaseModel, Field router = APIRouter() class SummarizeRequest(BaseModel): """Request model for video summarization.""" url: str = Field( ..., description="YouTube video URL", example="https://youtube.com/watch?v=dQw4w9WgXcQ" ) model: str = Field( "auto", description="AI model to use (openai, anthropic, deepseek, auto)", example="openai" ) max_length: Optional[int] = Field( None, description="Maximum summary length in words", ge=50, le=5000 ) @router.post( "/summarize", response_model=SummarizeResponse, status_code=status.HTTP_200_OK, summary="Summarize YouTube Video", description="Submit a YouTube video URL for AI-powered summarization" ) async def summarize_video(request: SummarizeRequest): """ Summarize a YouTube video using AI. This endpoint accepts a YouTube URL and returns a job ID for tracking the summarization progress. Use the /summary/{job_id} endpoint to retrieve the completed summary. """ pass ``` ## 5. Git Workflow ### Branch Naming ```bash # Feature branches feature/task-2-youtube-extraction feature/task-3-ai-summarization # Bugfix branches bugfix/transcript-encoding-error bugfix/rate-limit-handling # Hotfix branches hotfix/critical-api-error ``` ### Commit Messages Follow conventional commits: ```bash # Format: (): # Examples: feat(youtube): add transcript extraction service fix(api): handle rate limiting correctly docs(readme): update installation instructions test(youtube): add edge case tests refactor(cache): optimize cache key generation perf(summarizer): implement parallel processing chore(deps): update requirements.txt ``` ### Pull Request Template ```markdown ## Task Reference - Task ID: #3 - Task Title: Develop AI Summary Generation Service ## Description Brief description of changes made ## Changes Made - [ ] Implemented YouTube transcript extraction - [ ] Added multi-model AI support - [ ] Created caching layer - [ ] Added comprehensive tests ## Testing - [ ] Unit tests pass - [ ] Integration tests pass - [ ] Manual testing completed - [ ] Coverage > 80% ## Documentation - [ ] Code documented - [ ] API docs updated - [ ] README updated if needed ## Screenshots (if applicable) [Add screenshots here] ``` ## 6. API Design Standards ### RESTful Principles ```python # Good API design GET /api/summaries # List all summaries GET /api/summaries/{id} # Get specific summary POST /api/summaries # Create new summary PUT /api/summaries/{id} # Update summary DELETE /api/summaries/{id} # Delete summary # Status codes 200 OK # Successful GET/PUT 201 Created # Successful POST 202 Accepted # Processing async request 204 No Content # Successful DELETE 400 Bad Request # Invalid input 401 Unauthorized # Missing/invalid auth 403 Forbidden # No permission 404 Not Found # Resource doesn't exist 429 Too Many Requests # Rate limited 500 Internal Error # Server error ``` ### Response Format ```python # Success response { "success": true, "data": { "id": "uuid", "video_id": "abc123", "summary": "...", "metadata": {} }, "timestamp": "2025-01-25T10:00:00Z" } # Error response { "success": false, "error": { "code": "TRANSCRIPT_NOT_AVAILABLE", "message": "Could not extract transcript from video", "details": "No captions available in requested language" }, "timestamp": "2025-01-25T10:00:00Z" } ``` ### Pagination ```python @router.get("/summaries") async def list_summaries( page: int = Query(1, ge=1), limit: int = Query(20, ge=1, le=100), sort: str = Query("created_at", regex="^(created_at|updated_at|title)$"), order: str = Query("desc", regex="^(asc|desc)$") ): """List summaries with pagination.""" return { "data": summaries, "pagination": { "page": page, "limit": limit, "total": total_count, "pages": math.ceil(total_count / limit) } } ``` ## 7. Database Operations ### SQLAlchemy Models ```python from sqlalchemy import Column, String, Text, DateTime, Float, JSON from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.dialects.postgresql import UUID import uuid Base = declarative_base() class Summary(Base): __tablename__ = "summaries" id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) video_id = Column(String(20), nullable=False, index=True) video_url = Column(Text, nullable=False) video_title = Column(Text) transcript = Column(Text) summary = Column(Text) key_points = Column(JSON) chapters = Column(JSON) model_used = Column(String(50)) processing_time = Column(Float) created_at = Column(DateTime, default=datetime.utcnow) updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) def to_dict(self): """Convert to dictionary for API responses.""" return { "id": str(self.id), "video_id": self.video_id, "video_title": self.video_title, "summary": self.summary, "key_points": self.key_points, "chapters": self.chapters, "model_used": self.model_used, "created_at": self.created_at.isoformat() } ``` ### Database Migrations Use Alembic for migrations: ```bash # Create new migration alembic revision --autogenerate -m "Add chapters column" # Apply migrations alembic upgrade head # Rollback alembic downgrade -1 ``` ### Query Optimization ```python from sqlalchemy import select, and_ from sqlalchemy.orm import selectinload # Efficient querying with joins async def get_summaries_with_metadata(session, user_id: str): stmt = ( select(Summary) .options(selectinload(Summary.metadata)) .where(Summary.user_id == user_id) .order_by(Summary.created_at.desc()) .limit(10) ) result = await session.execute(stmt) return result.scalars().all() ``` ## 8. Performance Guidelines ### Caching Strategy ```python from functools import lru_cache import redis import hashlib import json class CacheService: def __init__(self): self.redis = redis.Redis(decode_responses=True) self.ttl = 3600 # 1 hour default def get_key(self, prefix: str, **kwargs) -> str: """Generate cache key from parameters.""" data = json.dumps(kwargs, sort_keys=True) hash_digest = hashlib.md5(data.encode()).hexdigest() return f"{prefix}:{hash_digest}" async def get_or_set(self, key: str, func, ttl: int = None): """Get from cache or compute and set.""" # Try cache first cached = self.redis.get(key) if cached: return json.loads(cached) # Compute result result = await func() # Cache result self.redis.setex( key, ttl or self.ttl, json.dumps(result) ) return result ``` ### Async Processing ```python from celery import Celery from typing import Dict, Any celery_app = Celery('youtube_summarizer') @celery_app.task async def process_video_task(video_url: str, options: Dict[str, Any]): """Background task for video processing.""" try: # Extract transcript transcript = await extract_transcript(video_url) # Generate summary summary = await generate_summary(transcript, options) # Save to database await save_summary(video_url, summary) return {"status": "completed", "summary_id": summary.id} except Exception as e: return {"status": "failed", "error": str(e)} ``` ### Performance Monitoring ```python import time from functools import wraps import logging logger = logging.getLogger(__name__) def measure_performance(func): """Decorator to measure function performance.""" @wraps(func) async def wrapper(*args, **kwargs): start = time.perf_counter() try: result = await func(*args, **kwargs) elapsed = time.perf_counter() - start logger.info(f"{func.__name__} took {elapsed:.3f}s") return result except Exception as e: elapsed = time.perf_counter() - start logger.error(f"{func.__name__} failed after {elapsed:.3f}s: {e}") raise return wrapper ``` ## 9. Security Protocols ### Input Validation ```python from pydantic import BaseModel, validator, HttpUrl import re class VideoURLValidator(BaseModel): url: HttpUrl @validator('url') def validate_youtube_url(cls, v): youtube_regex = re.compile( r'(https?://)?(www\.)?(youtube\.com|youtu\.be)/.+' ) if not youtube_regex.match(str(v)): raise ValueError('Invalid YouTube URL') return v ``` ### API Key Management ```python from pydantic import BaseSettings class Settings(BaseSettings): """Application settings with validation.""" # API Keys (never hardcode!) openai_api_key: str anthropic_api_key: str youtube_api_key: Optional[str] = None # Security secret_key: str allowed_origins: List[str] = ["http://localhost:3000"] class Config: env_file = ".env" env_file_encoding = "utf-8" case_sensitive = False settings = Settings() ``` ### Rate Limiting ```python from fastapi import Request, HTTPException from fastapi_limiter import FastAPILimiter from fastapi_limiter.depends import RateLimiter import redis.asyncio as redis # Initialize rate limiter async def init_rate_limiter(): redis_client = redis.from_url("redis://localhost:6379", encoding="utf-8", decode_responses=True) await FastAPILimiter.init(redis_client) # Apply rate limiting @router.post("/summarize", dependencies=[Depends(RateLimiter(times=10, seconds=60))]) async def summarize_video(request: SummarizeRequest): """Rate limited to 10 requests per minute.""" pass ``` ## 10. Deployment Process ### Docker Configuration ```dockerfile # Dockerfile FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application COPY . . # Run application CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8082"] ``` ### Environment Management ```bash # .env.development DEBUG=true DATABASE_URL=sqlite:///./dev.db LOG_LEVEL=DEBUG # .env.production DEBUG=false DATABASE_URL=postgresql://user:pass@db:5432/youtube_summarizer LOG_LEVEL=INFO ``` ### Health Checks ```python @router.get("/health") async def health_check(): """Health check endpoint for monitoring.""" checks = { "api": "healthy", "database": await check_database(), "cache": await check_cache(), "ai_service": await check_ai_service() } all_healthy = all(v == "healthy" for v in checks.values()) return { "status": "healthy" if all_healthy else "degraded", "checks": checks, "timestamp": datetime.utcnow().isoformat() } ``` ### Monitoring ```python from prometheus_client import Counter, Histogram, generate_latest # Metrics request_count = Counter('youtube_requests_total', 'Total requests') request_duration = Histogram('youtube_request_duration_seconds', 'Request duration') summary_generation_time = Histogram('summary_generation_seconds', 'Summary generation time') @router.get("/metrics") async def metrics(): """Prometheus metrics endpoint.""" return Response(generate_latest(), media_type="text/plain") ``` ## Agent-Specific Instructions ### For AI Agents When working on this codebase: 1. **Always check Task Master first**: `task-master next` 2. **Follow TDD**: Write tests before implementation 3. **Use type hints**: All functions must have type annotations 4. **Document changes**: Update docstrings and comments 5. **Test thoroughly**: Run full test suite before marking complete 6. **Update task status**: Keep Task Master updated with progress ### Quality Checklist Before marking any task as complete: - [ ] All tests pass (`pytest tests/`) - [ ] Code coverage > 80% (`pytest --cov=src`) - [ ] No linting errors (`ruff check src/`) - [ ] Type checking passes (`mypy src/`) - [ ] Documentation updated - [ ] Task Master updated - [ ] Changes committed with proper message ## Conclusion This guide ensures consistent, high-quality development across all contributors to the YouTube Summarizer project. Follow these standards to maintain code quality, performance, and security. --- *Last Updated: 2025-01-25* *Version: 1.0.0*