# AGENTS.md - YouTube Summarizer Backend This file provides guidance for AI agents working with the YouTube Summarizer backend implementation. ## Agent Development Context The backend has been implemented following Story-Driven Development patterns with comprehensive testing and production-ready patterns. Agents should understand the existing architecture and extend it following established conventions. ## Current Implementation Status ### ✅ Completed Stories - **Story 1.1**: Project Setup and Infrastructure - DONE - **Story 2.1**: Single AI Model Integration (Anthropic) - DONE - **Story 2.2**: Summary Generation Pipeline - DONE ⬅️ Just completed with full QA ### 🔄 Ready for Implementation - **Story 1.2**: YouTube URL Validation and Parsing - **Story 1.3**: Transcript Extraction Service - **Story 1.4**: Basic Web Interface - **Story 2.3**: Caching System Implementation - **Story 2.4**: Multi-Model Support - **Story 2.5**: Export Functionality ## Architecture Principles for Agents ### 1. Service Layer Pattern All business logic lives in the `services/` directory with clear interfaces: ```python # Follow this pattern for new services class VideoService: async def extract_video_id(self, url: str) -> str: ... async def get_video_metadata(self, video_id: str) -> Dict[str, Any]: ... async def validate_url(self, url: str) -> bool: ... ``` ### 2. Dependency Injection Pattern Use FastAPI's dependency injection for loose coupling: ```python def get_video_service() -> VideoService: return VideoService() @router.post("/api/endpoint") async def endpoint(service: VideoService = Depends(get_video_service)): return await service.process() ``` ### 3. Async-First Development All I/O operations must be async to prevent blocking: ```python # Correct async pattern async def process_video(self, url: str) -> Result: metadata = await self.video_service.get_metadata(url) transcript = await self.transcript_service.extract(url) summary = await self.ai_service.summarize(transcript) return Result(metadata=metadata, summary=summary) ``` ### 4. Error Handling Standards Use custom exceptions with proper HTTP status codes: ```python from backend.core.exceptions import ValidationError, AIServiceError try: result = await service.process(data) except ValidationError as e: raise HTTPException(status_code=400, detail=e.message) except AIServiceError as e: raise HTTPException(status_code=500, detail=e.message) ``` ## Implementation Patterns for Agents ### Adding New API Endpoints 1. **Create the endpoint in appropriate API module**: ```python # backend/api/videos.py from fastapi import APIRouter, HTTPException, Depends from ..services.video_service import VideoService router = APIRouter(prefix="/api/videos", tags=["videos"]) @router.post("/validate") async def validate_video_url( request: ValidateVideoRequest, service: VideoService = Depends(get_video_service) ): try: is_valid = await service.validate_url(request.url) return {"valid": is_valid} except ValidationError as e: raise HTTPException(status_code=400, detail=e.message) ``` 2. **Register router in main.py**: ```python from backend.api.videos import router as videos_router app.include_router(videos_router) ``` 3. **Add comprehensive tests**: ```python # tests/unit/test_video_service.py @pytest.mark.asyncio async def test_validate_url_success(): service = VideoService() result = await service.validate_url("https://youtube.com/watch?v=abc123") assert result is True # tests/integration/test_videos_api.py def test_validate_video_endpoint(client): response = client.post("/api/videos/validate", json={"url": "https://youtube.com/watch?v=test"}) assert response.status_code == 200 assert response.json()["valid"] is True ``` ### Extending the Pipeline When adding new pipeline stages, follow the established pattern: ```python # Add new stage to PipelineStage enum class PipelineStage(Enum): # ... existing stages ... NEW_STAGE = "new_stage" # Add stage processing to SummaryPipeline async def _execute_pipeline(self, job_id: str, config: PipelineConfig): # ... existing stages ... # New stage await self._update_progress(job_id, PipelineStage.NEW_STAGE, 85, "Processing new stage...") new_result = await self._process_new_stage(result, config) result.new_field = new_result # Add progress percentage mapping stage_percentages = { # ... existing mappings ... PipelineStage.NEW_STAGE: 85, } ``` ### Database Integration Pattern When adding database models, follow the repository pattern: ```python # backend/models/video.py from sqlalchemy import Column, String, DateTime, Text from .base import Base class Video(Base): __tablename__ = "videos" id = Column(String, primary_key=True) url = Column(String, nullable=False) title = Column(String) metadata = Column(Text) # JSON field created_at = Column(DateTime, default=datetime.utcnow) # backend/repositories/video_repository.py class VideoRepository: def __init__(self, session: AsyncSession): self.session = session async def create_video(self, video: Video) -> Video: self.session.add(video) await self.session.commit() return video async def get_by_id(self, video_id: str) -> Optional[Video]: result = await self.session.execute( select(Video).where(Video.id == video_id) ) return result.scalar_one_or_none() ``` ## Testing Guidelines for Agents ### Unit Test Structure ```python # tests/unit/test_new_service.py import pytest from unittest.mock import Mock, AsyncMock from backend.services.new_service import NewService class TestNewService: @pytest.fixture def service(self): return NewService() @pytest.mark.asyncio async def test_process_success(self, service): # Arrange input_data = "test_input" expected_output = "expected_result" # Act result = await service.process(input_data) # Assert assert result == expected_output @pytest.mark.asyncio async def test_process_error_handling(self, service): with pytest.raises(ServiceError): await service.process("invalid_input") ``` ### Integration Test Structure ```python # tests/integration/test_new_api.py from fastapi.testclient import TestClient from unittest.mock import patch, AsyncMock class TestNewAPI: def test_endpoint_success(self, client): with patch('backend.api.new.get_new_service') as mock_get_service: mock_service = Mock() mock_service.process = AsyncMock(return_value="result") mock_get_service.return_value = mock_service response = client.post("/api/new/process", json={"input": "test"}) assert response.status_code == 200 assert response.json() == {"result": "result"} ``` ## Code Quality Standards ### Documentation Requirements ```python class NewService: """Service for handling new functionality. This service integrates with external APIs and provides processed results for the application. """ async def process(self, input_data: str) -> Dict[str, Any]: """Process input data and return structured results. Args: input_data: Raw input string to process Returns: Processed results dictionary Raises: ValidationError: If input_data is invalid ProcessingError: If processing fails """ ``` ### Type Hints and Validation ```python from typing import Dict, List, Optional, Union from pydantic import BaseModel, Field class ProcessRequest(BaseModel): """Request model for processing endpoint.""" input_data: str = Field(..., description="Data to process") options: Optional[Dict[str, Any]] = Field(None, description="Processing options") class Config: schema_extra = { "example": { "input_data": "sample input", "options": {"format": "json"} } } ``` ### Error Handling Patterns ```python from backend.core.exceptions import BaseAPIException, ErrorCode class ProcessingError(BaseAPIException): """Raised when processing fails.""" def __init__(self, message: str, details: Optional[Dict] = None): super().__init__( message=message, error_code=ErrorCode.PROCESSING_ERROR, status_code=500, details=details, recoverable=True ) ``` ## Integration with Existing Services ### Using the Pipeline Service ```python # Get pipeline instance pipeline = get_summary_pipeline() # Start processing job_id = await pipeline.process_video( video_url="https://youtube.com/watch?v=abc123", config=PipelineConfig(summary_length="detailed") ) # Monitor progress result = await pipeline.get_pipeline_result(job_id) print(f"Status: {result.status}") ``` ### Using the AI Service ```python from backend.services.anthropic_summarizer import AnthropicSummarizer from backend.services.ai_service import SummaryRequest, SummaryLength ai_service = AnthropicSummarizer(api_key=api_key) summary_result = await ai_service.generate_summary( SummaryRequest( transcript="Video transcript text...", length=SummaryLength.STANDARD, focus_areas=["key insights", "actionable items"] ) ) print(f"Summary: {summary_result.summary}") print(f"Key Points: {summary_result.key_points}") ``` ### Using WebSocket Updates ```python from backend.core.websocket_manager import websocket_manager # Send progress update await websocket_manager.send_progress_update(job_id, { "stage": "processing", "percentage": 50, "message": "Halfway complete" }) # Send completion notification await websocket_manager.send_completion_notification(job_id, { "status": "completed", "result": result_data }) ``` ## Performance Patterns ### Caching Integration ```python from backend.services.cache_manager import CacheManager cache = CacheManager() # Cache expensive operations cache_key = f"expensive_operation:{input_hash}" cached_result = await cache.get_cached_result(cache_key) if not cached_result: result = await expensive_operation(input_data) await cache.cache_result(cache_key, result, ttl=3600) else: result = cached_result ``` ### Background Processing ```python import asyncio from fastapi import BackgroundTasks async def long_running_task(task_id: str, data: Dict): """Background task for processing.""" try: result = await process_data(data) await store_result(task_id, result) await notify_completion(task_id) except Exception as e: await store_error(task_id, str(e)) @router.post("/api/process-async") async def start_processing( request: ProcessRequest, background_tasks: BackgroundTasks ): task_id = str(uuid.uuid4()) background_tasks.add_task(long_running_task, task_id, request.dict()) return {"task_id": task_id, "status": "processing"} ``` ## Security Guidelines ### Input Validation ```python from pydantic import BaseModel, validator import re class VideoUrlRequest(BaseModel): url: str @validator('url') def validate_youtube_url(cls, v): youtube_pattern = r'^https?://(www\.)?(youtube\.com|youtu\.be)/.+' if not re.match(youtube_pattern, v): raise ValueError('Must be a valid YouTube URL') return v ``` ### API Key Management ```python import os from fastapi import HTTPException def get_api_key() -> str: api_key = os.getenv("ANTHROPIC_API_KEY") if not api_key: raise HTTPException( status_code=500, detail="API key not configured" ) return api_key ``` ## Deployment Considerations ### Environment Configuration ```python from pydantic import BaseSettings class Settings(BaseSettings): anthropic_api_key: str database_url: str = "sqlite:///./data/app.db" redis_url: Optional[str] = None log_level: str = "INFO" cors_origins: List[str] = ["http://localhost:3000"] class Config: env_file = ".env" settings = Settings() ``` ### Health Checks ```python @router.get("/health") async def health_check(): """Health check endpoint for load balancers.""" checks = { "database": await check_database_connection(), "cache": await check_cache_connection(), "ai_service": await check_ai_service(), } all_healthy = all(checks.values()) status_code = 200 if all_healthy else 503 return {"status": "healthy" if all_healthy else "unhealthy", "checks": checks} ``` ## Migration Patterns When extending existing functionality, maintain backward compatibility: ```python # Version 1 API @router.post("/api/summarize") async def summarize_v1(request: SummarizeRequest): # Legacy implementation pass # Version 2 API (new functionality) @router.post("/api/v2/summarize") async def summarize_v2(request: SummarizeRequestV2): # Enhanced implementation pass ``` This backend follows production-ready patterns and is designed for extensibility. Agents should maintain these standards when adding new functionality.