docs: Add comprehensive CLAUDE.md and AGENTS.md documentation

- Created CLAUDE.md with project-specific guidance for Claude Code - Added detailed implementation patterns and code examples - Included Task Master workflow integration - Created AGENTS.md with development standards and workflows - Added testing requirements, API design patterns, and security protocols - Provided comprehensive guidelines for TDD, performance, and deployment
2025-08-24 22:26:54 -04:00 · 2025-08-24 22:26:54 -04:00 · 1aa76ee0d5
parent b7b6378a40
commit 1aa76ee0d5
2 changed files with 1253 additions and 4 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -0,0 +1,849 @@
+# AGENTS.md - YouTube Summarizer Development Standards
+
+This document defines development workflows, standards, and best practices for the YouTube Summarizer project. It serves as a guide for both human developers and AI agents working on this codebase.
+
+## Table of Contents
+1. [Development Workflow](#1-development-workflow)
+2. [Code Standards](#2-code-standards)
+3. [Testing Requirements](#3-testing-requirements)
+4. [Documentation Standards](#4-documentation-standards)
+5. [Git Workflow](#5-git-workflow)
+6. [API Design Standards](#6-api-design-standards)
+7. [Database Operations](#7-database-operations)
+8. [Performance Guidelines](#8-performance-guidelines)
+9. [Security Protocols](#9-security-protocols)
+10. [Deployment Process](#10-deployment-process)
+
+## 1. Development Workflow
+
+### Task-Driven Development
+
+All development follows the Task Master workflow:
+
+```bash
+# 1. Get next task
+task-master next
+
+# 2. Review task details
+task-master show <id>
+
+# 3. Expand if needed
+task-master expand --id=<id> --research
+
+# 4. Set to in-progress
+task-master set-status --id=<id> --status=in-progress
+
+# 5. Implement feature
+# ... code implementation ...
+
+# 6. Test implementation
+pytest tests/
+
+# 7. Update task with notes
+task-master update-subtask --id=<id> --prompt="Implementation details..."
+
+# 8. Mark complete
+task-master set-status --id=<id> --status=done
+```
+
+### Feature Implementation Checklist
+
+- [ ] Review task requirements
+- [ ] Write unit tests first (TDD)
+- [ ] Implement feature
+- [ ] Add integration tests
+- [ ] Update documentation
+- [ ] Run full test suite
+- [ ] Update task status
+- [ ] Commit with descriptive message
+
+## 2. Code Standards
+
+### Python Style Guide
+
+```python
+"""
+Module docstring describing purpose and usage
+"""
+
+from typing import List, Optional, Dict, Any
+import asyncio
+from datetime import datetime
+
+# Constants in UPPER_CASE
+DEFAULT_TIMEOUT = 30
+MAX_RETRIES = 3
+
+class YouTubeSummarizer:
+    """
+    Class for summarizing YouTube videos.
+    
+    Attributes:
+        model: AI model to use for summarization
+        cache: Cache service instance
+    """
+    
+    def __init__(self, model: str = "openai"):
+        """Initialize summarizer with specified model."""
+        self.model = model
+        self.cache = CacheService()
+    
+    async def summarize(
+        self, 
+        video_url: str,
+        options: Optional[Dict[str, Any]] = None
+    ) -> Dict[str, Any]:
+        """
+        Summarize a YouTube video.
+        
+        Args:
+            video_url: YouTube video URL
+            options: Optional summarization parameters
+            
+        Returns:
+            Dictionary containing summary and metadata
+            
+        Raises:
+            YouTubeError: If video cannot be accessed
+            AIServiceError: If summarization fails
+        """
+        # Implementation here
+        pass
+```
+
+### Type Hints
+
+Always use type hints for better code quality:
+
+```python
+from typing import Union, List, Optional, Dict, Any, Tuple
+from pydantic import BaseModel, HttpUrl
+
+async def process_video(
+    url: HttpUrl,
+    models: List[str],
+    max_length: Optional[int] = None
+) -> Tuple[str, Dict[str, Any]]:
+    """Process video with type safety."""
+    pass
+```
+
+### Async/Await Pattern
+
+Use async for all I/O operations:
+
+```python
+async def fetch_transcript(video_id: str) -> str:
+    """Fetch transcript asynchronously."""
+    async with aiohttp.ClientSession() as session:
+        async with session.get(url) as response:
+            return await response.text()
+
+# Use asyncio.gather for parallel operations
+results = await asyncio.gather(
+    fetch_transcript(id1),
+    fetch_transcript(id2),
+    fetch_transcript(id3)
+)
+```
+
+## 3. Testing Requirements
+
+### Test Structure
+
+```
+tests/
+├── unit/
+│   ├── test_youtube_service.py
+│   ├── test_summarizer_service.py
+│   └── test_cache_service.py
+├── integration/
+│   ├── test_api_endpoints.py
+│   └── test_database.py
+├── fixtures/
+│   ├── sample_transcripts.json
+│   └── mock_responses.py
+└── conftest.py
+```
+
+### Unit Test Example
+
+```python
+# tests/unit/test_youtube_service.py
+import pytest
+from unittest.mock import Mock, patch, AsyncMock
+from src.services.youtube import YouTubeService
+
+class TestYouTubeService:
+    @pytest.fixture
+    def youtube_service(self):
+        return YouTubeService()
+    
+    @pytest.fixture
+    def mock_transcript(self):
+        return [
+            {"text": "Hello world", "start": 0.0, "duration": 2.0},
+            {"text": "This is a test", "start": 2.0, "duration": 3.0}
+        ]
+    
+    @pytest.mark.asyncio
+    async def test_extract_transcript_success(
+        self, 
+        youtube_service, 
+        mock_transcript
+    ):
+        with patch('youtube_transcript_api.YouTubeTranscriptApi.get_transcript') as mock_get:
+            mock_get.return_value = mock_transcript
+            
+            result = await youtube_service.extract_transcript("test_id")
+            
+            assert result == mock_transcript
+            mock_get.assert_called_once_with("test_id")
+    
+    def test_extract_video_id_various_formats(self, youtube_service):
+        test_cases = [
+            ("https://www.youtube.com/watch?v=abc123", "abc123"),
+            ("https://youtu.be/xyz789", "xyz789"),
+            ("https://youtube.com/embed/qwe456", "qwe456"),
+            ("https://www.youtube.com/watch?v=test&t=123", "test")
+        ]
+        
+        for url, expected_id in test_cases:
+            assert youtube_service.extract_video_id(url) == expected_id
+```
+
+### Integration Test Example
+
+```python
+# tests/integration/test_api_endpoints.py
+import pytest
+from fastapi.testclient import TestClient
+from src.main import app
+
+@pytest.fixture
+def client():
+    return TestClient(app)
+
+class TestSummarizationAPI:
+    @pytest.mark.asyncio
+    async def test_summarize_endpoint(self, client):
+        response = client.post("/api/summarize", json={
+            "url": "https://youtube.com/watch?v=test123",
+            "model": "openai",
+            "options": {"max_length": 500}
+        })
+        
+        assert response.status_code == 200
+        data = response.json()
+        assert "job_id" in data
+        assert data["status"] == "processing"
+    
+    @pytest.mark.asyncio
+    async def test_get_summary(self, client):
+        # First create a summary
+        create_response = client.post("/api/summarize", json={
+            "url": "https://youtube.com/watch?v=test123"
+        })
+        job_id = create_response.json()["job_id"]
+        
+        # Then retrieve it
+        get_response = client.get(f"/api/summary/{job_id}")
+        assert get_response.status_code in [200, 202]  # 202 if still processing
+```
+
+### Test Coverage Requirements
+
+- Minimum 80% code coverage
+- 100% coverage for critical paths
+- All edge cases tested
+- Error conditions covered
+
+```bash
+# Run tests with coverage
+pytest tests/ --cov=src --cov-report=html --cov-report=term
+
+# Coverage report should show:
+# src/services/youtube.py      95%
+# src/services/summarizer.py   88%
+# src/api/routes.py            92%
+```
+
+## 4. Documentation Standards
+
+### Code Documentation
+
+Every module, class, and function must have docstrings:
+
+```python
+"""
+Module: YouTube Transcript Extractor
+
+This module provides functionality to extract transcripts from YouTube videos
+using multiple fallback methods.
+
+Example:
+    >>> extractor = TranscriptExtractor()
+    >>> transcript = await extractor.extract("video_id")
+"""
+
+def extract_transcript(
+    video_id: str,
+    language: str = "en",
+    include_auto_generated: bool = True
+) -> List[Dict[str, Any]]:
+    """
+    Extract transcript from YouTube video.
+    
+    This function attempts to extract transcripts using the following priority:
+    1. Manual captions in specified language
+    2. Auto-generated captions if allowed
+    3. Translated captions as fallback
+    
+    Args:
+        video_id: YouTube video identifier
+        language: ISO 639-1 language code (default: "en")
+        include_auto_generated: Whether to use auto-generated captions
+        
+    Returns:
+        List of transcript segments with text, start time, and duration
+        
+    Raises:
+        TranscriptNotAvailable: If no transcript can be extracted
+        
+    Example:
+        >>> transcript = extract_transcript("dQw4w9WgXcQ", "en")
+        >>> print(transcript[0])
+        {"text": "Never gonna give you up", "start": 0.0, "duration": 3.5}
+    """
+    pass
+```
+
+### API Documentation
+
+Use FastAPI's automatic documentation features:
+
+```python
+from fastapi import APIRouter, HTTPException, status
+from pydantic import BaseModel, Field
+
+router = APIRouter()
+
+class SummarizeRequest(BaseModel):
+    """Request model for video summarization."""
+    
+    url: str = Field(
+        ..., 
+        description="YouTube video URL",
+        example="https://youtube.com/watch?v=dQw4w9WgXcQ"
+    )
+    model: str = Field(
+        "auto",
+        description="AI model to use (openai, anthropic, deepseek, auto)",
+        example="openai"
+    )
+    max_length: Optional[int] = Field(
+        None,
+        description="Maximum summary length in words",
+        ge=50,
+        le=5000
+    )
+
+@router.post(
+    "/summarize",
+    response_model=SummarizeResponse,
+    status_code=status.HTTP_200_OK,
+    summary="Summarize YouTube Video",
+    description="Submit a YouTube video URL for AI-powered summarization"
+)
+async def summarize_video(request: SummarizeRequest):
+    """
+    Summarize a YouTube video using AI.
+    
+    This endpoint accepts a YouTube URL and returns a job ID for tracking
+    the summarization progress. Use the /summary/{job_id} endpoint to
+    retrieve the completed summary.
+    """
+    pass
+```
+
+## 5. Git Workflow
+
+### Branch Naming
+
+```bash
+# Feature branches
+feature/task-2-youtube-extraction
+feature/task-3-ai-summarization
+
+# Bugfix branches
+bugfix/transcript-encoding-error
+bugfix/rate-limit-handling
+
+# Hotfix branches
+hotfix/critical-api-error
+```
+
+### Commit Messages
+
+Follow conventional commits:
+
+```bash
+# Format: <type>(<scope>): <subject>
+
+# Examples:
+feat(youtube): add transcript extraction service
+fix(api): handle rate limiting correctly
+docs(readme): update installation instructions
+test(youtube): add edge case tests
+refactor(cache): optimize cache key generation
+perf(summarizer): implement parallel processing
+chore(deps): update requirements.txt
+```
+
+### Pull Request Template
+
+```markdown
+## Task Reference
+- Task ID: #3
+- Task Title: Develop AI Summary Generation Service
+
+## Description
+Brief description of changes made
+
+## Changes Made
+- [ ] Implemented YouTube transcript extraction
+- [ ] Added multi-model AI support
+- [ ] Created caching layer
+- [ ] Added comprehensive tests
+
+## Testing
+- [ ] Unit tests pass
+- [ ] Integration tests pass
+- [ ] Manual testing completed
+- [ ] Coverage > 80%
+
+## Documentation
+- [ ] Code documented
+- [ ] API docs updated
+- [ ] README updated if needed
+
+## Screenshots (if applicable)
+[Add screenshots here]
+```
+
+## 6. API Design Standards
+
+### RESTful Principles
+
+```python
+# Good API design
+GET    /api/summaries          # List all summaries
+GET    /api/summaries/{id}     # Get specific summary
+POST   /api/summaries          # Create new summary
+PUT    /api/summaries/{id}     # Update summary
+DELETE /api/summaries/{id}     # Delete summary
+
+# Status codes
+200 OK                  # Successful GET/PUT
+201 Created            # Successful POST
+202 Accepted           # Processing async request
+204 No Content         # Successful DELETE
+400 Bad Request        # Invalid input
+401 Unauthorized       # Missing/invalid auth
+403 Forbidden          # No permission
+404 Not Found          # Resource doesn't exist
+429 Too Many Requests  # Rate limited
+500 Internal Error     # Server error
+```
+
+### Response Format
+
+```python
+# Success response
+{
+    "success": true,
+    "data": {
+        "id": "uuid",
+        "video_id": "abc123",
+        "summary": "...",
+        "metadata": {}
+    },
+    "timestamp": "2025-01-25T10:00:00Z"
+}
+
+# Error response
+{
+    "success": false,
+    "error": {
+        "code": "TRANSCRIPT_NOT_AVAILABLE",
+        "message": "Could not extract transcript from video",
+        "details": "No captions available in requested language"
+    },
+    "timestamp": "2025-01-25T10:00:00Z"
+}
+```
+
+### Pagination
+
+```python
+@router.get("/summaries")
+async def list_summaries(
+    page: int = Query(1, ge=1),
+    limit: int = Query(20, ge=1, le=100),
+    sort: str = Query("created_at", regex="^(created_at|updated_at|title)$"),
+    order: str = Query("desc", regex="^(asc|desc)$")
+):
+    """List summaries with pagination."""
+    return {
+        "data": summaries,
+        "pagination": {
+            "page": page,
+            "limit": limit,
+            "total": total_count,
+            "pages": math.ceil(total_count / limit)
+        }
+    }
+```
+
+## 7. Database Operations
+
+### SQLAlchemy Models
+
+```python
+from sqlalchemy import Column, String, Text, DateTime, Float, JSON
+from sqlalchemy.ext.declarative import declarative_base
+from sqlalchemy.dialects.postgresql import UUID
+import uuid
+
+Base = declarative_base()
+
+class Summary(Base):
+    __tablename__ = "summaries"
+    
+    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
+    video_id = Column(String(20), nullable=False, index=True)
+    video_url = Column(Text, nullable=False)
+    video_title = Column(Text)
+    transcript = Column(Text)
+    summary = Column(Text)
+    key_points = Column(JSON)
+    chapters = Column(JSON)
+    model_used = Column(String(50))
+    processing_time = Column(Float)
+    created_at = Column(DateTime, default=datetime.utcnow)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+    
+    def to_dict(self):
+        """Convert to dictionary for API responses."""
+        return {
+            "id": str(self.id),
+            "video_id": self.video_id,
+            "video_title": self.video_title,
+            "summary": self.summary,
+            "key_points": self.key_points,
+            "chapters": self.chapters,
+            "model_used": self.model_used,
+            "created_at": self.created_at.isoformat()
+        }
+```
+
+### Database Migrations
+
+Use Alembic for migrations:
+
+```bash
+# Create new migration
+alembic revision --autogenerate -m "Add chapters column"
+
+# Apply migrations
+alembic upgrade head
+
+# Rollback
+alembic downgrade -1
+```
+
+### Query Optimization
+
+```python
+from sqlalchemy import select, and_
+from sqlalchemy.orm import selectinload
+
+# Efficient querying with joins
+async def get_summaries_with_metadata(session, user_id: str):
+    stmt = (
+        select(Summary)
+        .options(selectinload(Summary.metadata))
+        .where(Summary.user_id == user_id)
+        .order_by(Summary.created_at.desc())
+        .limit(10)
+    )
+    
+    result = await session.execute(stmt)
+    return result.scalars().all()
+```
+
+## 8. Performance Guidelines
+
+### Caching Strategy
+
+```python
+from functools import lru_cache
+import redis
+import hashlib
+import json
+
+class CacheService:
+    def __init__(self):
+        self.redis = redis.Redis(decode_responses=True)
+        self.ttl = 3600  # 1 hour default
+    
+    def get_key(self, prefix: str, **kwargs) -> str:
+        """Generate cache key from parameters."""
+        data = json.dumps(kwargs, sort_keys=True)
+        hash_digest = hashlib.md5(data.encode()).hexdigest()
+        return f"{prefix}:{hash_digest}"
+    
+    async def get_or_set(self, key: str, func, ttl: int = None):
+        """Get from cache or compute and set."""
+        # Try cache first
+        cached = self.redis.get(key)
+        if cached:
+            return json.loads(cached)
+        
+        # Compute result
+        result = await func()
+        
+        # Cache result
+        self.redis.setex(
+            key, 
+            ttl or self.ttl,
+            json.dumps(result)
+        )
+        
+        return result
+```
+
+### Async Processing
+
+```python
+from celery import Celery
+from typing import Dict, Any
+
+celery_app = Celery('youtube_summarizer')
+
+@celery_app.task
+async def process_video_task(video_url: str, options: Dict[str, Any]):
+    """Background task for video processing."""
+    try:
+        # Extract transcript
+        transcript = await extract_transcript(video_url)
+        
+        # Generate summary
+        summary = await generate_summary(transcript, options)
+        
+        # Save to database
+        await save_summary(video_url, summary)
+        
+        return {"status": "completed", "summary_id": summary.id}
+    except Exception as e:
+        return {"status": "failed", "error": str(e)}
+```
+
+### Performance Monitoring
+
+```python
+import time
+from functools import wraps
+import logging
+
+logger = logging.getLogger(__name__)
+
+def measure_performance(func):
+    """Decorator to measure function performance."""
+    @wraps(func)
+    async def wrapper(*args, **kwargs):
+        start = time.perf_counter()
+        try:
+            result = await func(*args, **kwargs)
+            elapsed = time.perf_counter() - start
+            logger.info(f"{func.__name__} took {elapsed:.3f}s")
+            return result
+        except Exception as e:
+            elapsed = time.perf_counter() - start
+            logger.error(f"{func.__name__} failed after {elapsed:.3f}s: {e}")
+            raise
+    return wrapper
+```
+
+## 9. Security Protocols
+
+### Input Validation
+
+```python
+from pydantic import BaseModel, validator, HttpUrl
+import re
+
+class VideoURLValidator(BaseModel):
+    url: HttpUrl
+    
+    @validator('url')
+    def validate_youtube_url(cls, v):
+        youtube_regex = re.compile(
+            r'(https?://)?(www\.)?(youtube\.com|youtu\.be)/.+'
+        )
+        if not youtube_regex.match(str(v)):
+            raise ValueError('Invalid YouTube URL')
+        return v
+```
+
+### API Key Management
+
+```python
+from pydantic import BaseSettings
+
+class Settings(BaseSettings):
+    """Application settings with validation."""
+    
+    # API Keys (never hardcode!)
+    openai_api_key: str
+    anthropic_api_key: str
+    youtube_api_key: Optional[str] = None
+    
+    # Security
+    secret_key: str
+    allowed_origins: List[str] = ["http://localhost:3000"]
+    
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+        case_sensitive = False
+
+settings = Settings()
+```
+
+### Rate Limiting
+
+```python
+from fastapi import Request, HTTPException
+from fastapi_limiter import FastAPILimiter
+from fastapi_limiter.depends import RateLimiter
+import redis.asyncio as redis
+
+# Initialize rate limiter
+async def init_rate_limiter():
+    redis_client = redis.from_url("redis://localhost:6379", encoding="utf-8", decode_responses=True)
+    await FastAPILimiter.init(redis_client)
+
+# Apply rate limiting
+@router.post("/summarize", dependencies=[Depends(RateLimiter(times=10, seconds=60))])
+async def summarize_video(request: SummarizeRequest):
+    """Rate limited to 10 requests per minute."""
+    pass
+```
+
+## 10. Deployment Process
+
+### Docker Configuration
+
+```dockerfile
+# Dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application
+COPY . .
+
+# Run application
+CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8082"]
+```
+
+### Environment Management
+
+```bash
+# .env.development
+DEBUG=true
+DATABASE_URL=sqlite:///./dev.db
+LOG_LEVEL=DEBUG
+
+# .env.production
+DEBUG=false
+DATABASE_URL=postgresql://user:pass@db:5432/youtube_summarizer
+LOG_LEVEL=INFO
+```
+
+### Health Checks
+
+```python
+@router.get("/health")
+async def health_check():
+    """Health check endpoint for monitoring."""
+    checks = {
+        "api": "healthy",
+        "database": await check_database(),
+        "cache": await check_cache(),
+        "ai_service": await check_ai_service()
+    }
+    
+    all_healthy = all(v == "healthy" for v in checks.values())
+    
+    return {
+        "status": "healthy" if all_healthy else "degraded",
+        "checks": checks,
+        "timestamp": datetime.utcnow().isoformat()
+    }
+```
+
+### Monitoring
+
+```python
+from prometheus_client import Counter, Histogram, generate_latest
+
+# Metrics
+request_count = Counter('youtube_requests_total', 'Total requests')
+request_duration = Histogram('youtube_request_duration_seconds', 'Request duration')
+summary_generation_time = Histogram('summary_generation_seconds', 'Summary generation time')
+
+@router.get("/metrics")
+async def metrics():
+    """Prometheus metrics endpoint."""
+    return Response(generate_latest(), media_type="text/plain")
+```
+
+## Agent-Specific Instructions
+
+### For AI Agents
+
+When working on this codebase:
+
+1. **Always check Task Master first**: `task-master next`
+2. **Follow TDD**: Write tests before implementation
+3. **Use type hints**: All functions must have type annotations
+4. **Document changes**: Update docstrings and comments
+5. **Test thoroughly**: Run full test suite before marking complete
+6. **Update task status**: Keep Task Master updated with progress
+
+### Quality Checklist
+
+Before marking any task as complete:
+
+- [ ] All tests pass (`pytest tests/`)
+- [ ] Code coverage > 80% (`pytest --cov=src`)
+- [ ] No linting errors (`ruff check src/`)
+- [ ] Type checking passes (`mypy src/`)
+- [ ] Documentation updated
+- [ ] Task Master updated
+- [ ] Changes committed with proper message
+
+## Conclusion
+
+This guide ensures consistent, high-quality development across all contributors to the YouTube Summarizer project. Follow these standards to maintain code quality, performance, and security.
+
+---
+
+*Last Updated: 2025-01-25*
+*Version: 1.0.0*
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -1,5 +1,405 @@
-# Claude Code Instructions
+# CLAUDE.md - YouTube Summarizer

-## Task Master AI Instructions
-**Import Task Master's development workflow commands and guidelines, treat as if import is in the main CLAUDE.md file.**
-@./.taskmaster/CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with the YouTube Summarizer project.
+
+## Project Overview
+
+An AI-powered web application that automatically extracts, transcribes, and summarizes YouTube videos. The application supports multiple AI models (OpenAI, Anthropic, DeepSeek), provides various export formats, and includes intelligent caching for efficiency.
+
+**Status**: Development Phase - 12 tasks (0% complete) managed via Task Master
+
+## Quick Start Commands
+
+```bash
+# Development
+cd apps/youtube-summarizer
+source venv/bin/activate      # Activate virtual environment
+python src/main.py            # Run the application (port 8082)
+
+# Task Management
+task-master list              # View all tasks
+task-master next              # Get next task to work on
+task-master show <id>         # View task details
+task-master set-status --id=<id> --status=done  # Mark task complete
+
+# Testing
+pytest tests/ -v              # Run tests
+pytest tests/ --cov=src       # Run with coverage
+
+# Git Operations
+git add .
+git commit -m "feat: implement task X.Y"
+git push origin main
+```
+
+## Architecture
+
+```
+YouTube Summarizer
+├── API Layer (FastAPI)
+│   ├── /api/summarize - Submit URL for summarization
+│   ├── /api/summary/{id} - Retrieve summary
+│   └── /api/export/{id} - Export in various formats
+├── Service Layer
+│   ├── YouTube Service - Transcript extraction
+│   ├── AI Service - Summary generation
+│   └── Cache Service - Performance optimization
+└── Data Layer
+    ├── SQLite/PostgreSQL - Summary storage
+    └── Redis (optional) - Caching layer
+```
+
+## Development Workflow
+
+### 1. Check Current Task
+```bash
+task-master next
+task-master show <id>
+```
+
+### 2. Implement Feature
+Follow the task details and implement in appropriate modules:
+- API endpoints → `src/api/`
+- Business logic → `src/services/`
+- Utilities → `src/utils/`
+
+### 3. Test Implementation
+```bash
+# Unit tests
+pytest tests/unit/test_<module>.py -v
+
+# Integration tests
+pytest tests/integration/ -v
+
+# Manual testing
+python src/main.py
+# Visit http://localhost:8082/docs for API testing
+```
+
+### 4. Update Task Status
+```bash
+# Log progress
+task-master update-subtask --id=<id> --prompt="Implemented X, tested Y"
+
+# Mark complete
+task-master set-status --id=<id> --status=done
+```
+
+## Key Implementation Areas
+
+### YouTube Integration (`src/services/youtube.py`)
+```python
+# Primary: youtube-transcript-api
+from youtube_transcript_api import YouTubeTranscriptApi
+
+# Fallback: yt-dlp for metadata
+import yt_dlp
+
+# Extract video ID from various URL formats
+# Handle multiple subtitle languages
+# Implement retry logic for failures
+```
+
+### AI Summarization (`src/services/summarizer.py`)
+```python
+# Multi-model support
+class SummarizerService:
+    def __init__(self):
+        self.models = {
+            'openai': OpenAISummarizer(),
+            'anthropic': AnthropicSummarizer(),
+            'deepseek': DeepSeekSummarizer()
+        }
+    
+    async def summarize(self, transcript, model='auto'):
+        # Implement model selection logic
+        # Handle token limits
+        # Generate structured summaries
+```
+
+### Caching Strategy (`src/services/cache.py`)
+```python
+# Cache at multiple levels:
+# 1. Transcript cache (by video_id)
+# 2. Summary cache (by video_id + model + params)
+# 3. Export cache (by summary_id + format)
+
+# Use hash for cache keys
+import hashlib
+
+def get_cache_key(video_id: str, model: str, params: dict) -> str:
+    key_data = f"{video_id}:{model}:{json.dumps(params, sort_keys=True)}"
+    return hashlib.sha256(key_data.encode()).hexdigest()
+```
+
+## API Endpoint Patterns
+
+### FastAPI Best Practices
+```python
+from fastapi import APIRouter, HTTPException, BackgroundTasks
+from pydantic import BaseModel, HttpUrl
+
+router = APIRouter(prefix="/api", tags=["summarization"])
+
+class SummarizeRequest(BaseModel):
+    url: HttpUrl
+    model: str = "auto"
+    options: dict = {}
+
+@router.post("/summarize")
+async def summarize_video(
+    request: SummarizeRequest,
+    background_tasks: BackgroundTasks
+):
+    # Validate URL
+    # Extract video ID
+    # Check cache
+    # Queue for processing if needed
+    # Return job ID for status checking
+```
+
+## Database Schema
+
+```sql
+-- Main summaries table
+CREATE TABLE summaries (
+    id UUID PRIMARY KEY,
+    video_id VARCHAR(20) NOT NULL,
+    video_title TEXT,
+    video_url TEXT NOT NULL,
+    transcript TEXT,
+    summary TEXT,
+    key_points JSONB,
+    chapters JSONB,
+    model_used VARCHAR(50),
+    processing_time FLOAT,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- Cache for performance
+CREATE INDEX idx_video_id ON summaries(video_id);
+CREATE INDEX idx_created_at ON summaries(created_at);
+```
+
+## Error Handling
+
+```python
+class YouTubeError(Exception):
+    """Base exception for YouTube-related errors"""
+    pass
+
+class TranscriptNotAvailable(YouTubeError):
+    """Raised when transcript cannot be extracted"""
+    pass
+
+class AIServiceError(Exception):
+    """Base exception for AI service errors"""
+    pass
+
+class TokenLimitExceeded(AIServiceError):
+    """Raised when content exceeds model token limit"""
+    pass
+
+# Global error handler
+@app.exception_handler(YouTubeError)
+async def youtube_error_handler(request, exc):
+    return JSONResponse(
+        status_code=400,
+        content={"error": str(exc), "type": "youtube_error"}
+    )
+```
+
+## Environment Variables
+
+```bash
+# Required
+OPENAI_API_KEY=sk-...        # At least one AI key required
+ANTHROPIC_API_KEY=sk-ant-...
+DEEPSEEK_API_KEY=sk-...
+DATABASE_URL=sqlite:///./data/youtube_summarizer.db
+SECRET_KEY=your-secret-key
+
+# Optional but recommended
+YOUTUBE_API_KEY=AIza...       # For metadata and quota
+REDIS_URL=redis://localhost:6379/0
+RATE_LIMIT_PER_MINUTE=30
+MAX_VIDEO_LENGTH_MINUTES=180
+```
+
+## Testing Guidelines
+
+### Unit Test Structure
+```python
+# tests/unit/test_youtube_service.py
+import pytest
+from unittest.mock import Mock, patch
+from src.services.youtube import YouTubeService
+
+@pytest.fixture
+def youtube_service():
+    return YouTubeService()
+
+def test_extract_video_id(youtube_service):
+    urls = [
+        ("https://youtube.com/watch?v=abc123", "abc123"),
+        ("https://youtu.be/xyz789", "xyz789"),
+        ("https://www.youtube.com/embed/qwe456", "qwe456")
+    ]
+    for url, expected_id in urls:
+        assert youtube_service.extract_video_id(url) == expected_id
+```
+
+### Integration Test Pattern
+```python
+# tests/integration/test_api.py
+from fastapi.testclient import TestClient
+from src.main import app
+
+client = TestClient(app)
+
+def test_summarize_endpoint():
+    response = client.post("/api/summarize", json={
+        "url": "https://youtube.com/watch?v=test123",
+        "model": "openai"
+    })
+    assert response.status_code == 200
+    assert "job_id" in response.json()
+```
+
+## Performance Optimization
+
+1. **Async Everything**: Use async/await for all I/O operations
+2. **Background Tasks**: Process summaries in background
+3. **Caching Layers**: 
+   - Memory cache for hot data
+   - Database cache for persistence
+   - CDN for static exports
+4. **Rate Limiting**: Implement per-IP and per-user limits
+5. **Token Optimization**: 
+   - Chunk long transcripts
+   - Use map-reduce for summaries
+   - Implement progressive summarization
+
+## Security Considerations
+
+1. **Input Validation**: Validate all YouTube URLs
+2. **API Key Management**: Use environment variables, never commit keys
+3. **Rate Limiting**: Prevent abuse and API exhaustion
+4. **CORS Configuration**: Restrict to known domains in production
+5. **SQL Injection Prevention**: Use parameterized queries
+6. **XSS Protection**: Sanitize all user inputs
+7. **Authentication**: Implement JWT for user sessions (Phase 3)
+
+## Common Issues and Solutions
+
+### Issue: Transcript Not Available
+```python
+# Solution: Implement fallback chain
+try:
+    transcript = await get_youtube_transcript(video_id)
+except TranscriptNotAvailable:
+    # Try auto-generated captions
+    transcript = await get_auto_captions(video_id)
+    if not transcript:
+        # Use audio transcription as last resort
+        transcript = await transcribe_audio(video_id)
+```
+
+### Issue: Token Limit Exceeded
+```python
+# Solution: Implement chunking
+def chunk_transcript(transcript, max_tokens=3000):
+    chunks = []
+    current_chunk = []
+    current_tokens = 0
+    
+    for segment in transcript:
+        segment_tokens = count_tokens(segment)
+        if current_tokens + segment_tokens > max_tokens:
+            chunks.append(current_chunk)
+            current_chunk = [segment]
+            current_tokens = segment_tokens
+        else:
+            current_chunk.append(segment)
+            current_tokens += segment_tokens
+    
+    if current_chunk:
+        chunks.append(current_chunk)
+    
+    return chunks
+```
+
+### Issue: Rate Limiting
+```python
+# Solution: Implement exponential backoff
+import asyncio
+from typing import Optional
+
+async def retry_with_backoff(
+    func, 
+    max_retries: int = 3,
+    initial_delay: float = 1.0
+) -> Optional[Any]:
+    delay = initial_delay
+    for attempt in range(max_retries):
+        try:
+            return await func()
+        except RateLimitError:
+            if attempt == max_retries - 1:
+                raise
+            await asyncio.sleep(delay)
+            delay *= 2  # Exponential backoff
+```
+
+## Development Tips
+
+1. **Start with Task 1**: Setup and environment configuration
+2. **Test Early**: Write tests as you implement features
+3. **Use Type Hints**: Improve code quality and IDE support
+4. **Document APIs**: Use FastAPI's automatic documentation
+5. **Log Everything**: Implement comprehensive logging for debugging
+6. **Cache Aggressively**: Reduce API calls and improve response times
+7. **Handle Errors Gracefully**: Provide helpful error messages to users
+
+## Task Master Integration
+
+This project uses Task Master for task management. Key commands:
+
+```bash
+# View current progress
+task-master list
+
+# Get detailed task info
+task-master show 1
+
+# Expand task into subtasks
+task-master expand --id=1 --research
+
+# Update task with progress
+task-master update-task --id=1 --prompt="Completed API structure"
+
+# Complete task
+task-master set-status --id=1 --status=done
+```
+
+## Related Documentation
+
+- [Project README](README.md) - General project information
+- [AGENTS.md](AGENTS.md) - Development workflow and standards
+- [Task Master Guide](.taskmaster/CLAUDE.md) - Task management details
+- [API Documentation](http://localhost:8082/docs) - Interactive API docs (when running)
+
+## Current Focus Areas (Based on Task Master)
+
+1. **Task 1**: Setup Project Structure and Environment ⬅️ Start here
+2. **Task 2**: Implement YouTube Transcript Extraction
+3. **Task 3**: Develop AI Summary Generation Service
+4. **Task 4**: Create Basic Frontend Interface
+5. **Task 5**: Implement FastAPI Backend Endpoints
+
+Remember to check task dependencies and complete prerequisites before moving to dependent tasks.
+
+---
+
+*This guide is specifically tailored for Claude Code development on the YouTube Summarizer project.*