37 KiB
AGENTS.md - YouTube Summarizer Development Standards
This document defines development workflows, standards, and best practices for the YouTube Summarizer project. It serves as a guide for both human developers and AI agents working on this codebase.
🚨 CRITICAL: Server Status Checking Protocol
MANDATORY: Check server status before ANY testing or debugging:
# 1. ALWAYS CHECK server status FIRST
lsof -i :3002 | grep LISTEN # Check frontend (expected port)
lsof -i :8000 | grep LISTEN # Check backend (expected port)
# 2. If servers NOT running, RESTART them
cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
./scripts/restart-frontend.sh # After frontend changes
./scripts/restart-backend.sh # After backend changes
./scripts/restart-both.sh # After changes to both
# 3. VERIFY restart was successful
lsof -i :3002 | grep LISTEN # Should show node process
lsof -i :8000 | grep LISTEN # Should show python process
# 4. ONLY THEN proceed with testing
Server Checking Rules:
- ✅ ALWAYS check server status before testing
- ✅ ALWAYS restart servers after code changes
- ✅ ALWAYS verify restart was successful
- ❌ NEVER assume servers are running
- ❌ NEVER test without confirming server status
- ❌ NEVER debug "errors" without checking if server is running
🚨 CRITICAL: Documentation Preservation Rule
MANDATORY: Preserve critical documentation sections:
- ❌ NEVER remove critical sections from CLAUDE.md or AGENTS.md
- ❌ NEVER delete server checking protocols or development standards
- ❌ NEVER remove established workflows or troubleshooting guides
- ❌ NEVER delete testing procedures or quality standards
- ✅ ONLY remove sections when explicitly instructed by the user
- ✅ ALWAYS preserve and enhance existing documentation
🚩 CRITICAL: Directory Awareness Protocol
MANDATORY BEFORE ANY COMMAND: ALWAYS verify your current working directory before running any command.
# ALWAYS run this first before ANY command
pwd
# Expected result for YouTube Summarizer:
# /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
Critical Directory Rules
- NEVER assume you're in the correct directory
- ALWAYS verify with
pwdbefore running commands - YouTube Summarizer development requires being in
/Users/enias/projects/my-ai-projects/apps/youtube-summarizer - Backend server (
python3 backend/main.py) must be run from YouTube Summarizer root - Frontend development (
npm run dev) must be run from YouTube Summarizer root - Database operations and migrations will fail if run from wrong directory
YouTube Summarizer Directory Verification
# ❌ WRONG - Running from main project or apps directory
cd /Users/enias/projects/my-ai-projects
python3 backend/main.py # Will fail - backend/ doesn't exist here
cd /Users/enias/projects/my-ai-projects/apps
python3 main.py # Will fail - no main.py in apps/
# ✅ CORRECT - Always navigate to YouTube Summarizer
cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
pwd # Verify: /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
python3 backend/main.py # Backend server
# OR
python3 main.py # Alternative entry point
🚀 Quick Start for Developers
All stories are created and ready for implementation!
- Start Here: Developer Handoff Guide
- Sprint Plan: Sprint Planning Document
- First Story: Story 1.2 - URL Validation
Total Implementation Time: ~6 weeks (3 sprints)
- Sprint 1: Epic 1 (Foundation) - Stories 1.2-1.4
- Sprint 2: Epic 2 Core - Stories 2.1-2.3
- Sprint 3: Epic 2 Advanced - Stories 2.4-2.5
Table of Contents
- Development Workflow
- Code Standards
- Testing Requirements
- Documentation Standards
- Git Workflow
- API Design Standards
- Database Operations
- Performance Guidelines
- Security Protocols
- Deployment Process
🚨 CRITICAL: Documentation Update Rule
MANDATORY: After completing significant coding work, automatically update ALL documentation:
Documentation Update Protocol
- After Feature Implementation → Update relevant documentation files:
- CLAUDE.md - Development guidance and protocols
- AGENTS.md (this file) - Development standards and workflows
- README.md - User-facing features and setup instructions
- CHANGELOG.md - Version history and changes
- FILE_STRUCTURE.md - Directory structure and file organization
When to Update Documentation
- ✅ After implementing new features → Update all relevant docs
- ✅ After fixing significant bugs → Update troubleshooting guides
- ✅ After changing architecture → Update CLAUDE.md, AGENTS.md, FILE_STRUCTURE.md
- ✅ After adding new tools/scripts → Update CLAUDE.md, AGENTS.md, README.md
- ✅ After configuration changes → Update setup documentation
- ✅ At end of development sessions → Comprehensive doc review
Documentation Workflow Integration
# After completing significant code changes:
# 1. Test changes work
./scripts/restart-backend.sh # Test backend changes
./scripts/restart-frontend.sh # Test frontend changes (if needed)
# 2. Update relevant documentation files
# 3. Commit documentation with code changes
git add CLAUDE.md AGENTS.md README.md CHANGELOG.md FILE_STRUCTURE.md
git commit -m "feat: implement feature X with documentation updates"
Documentation Standards
- Format: Use clear headings, code blocks, and examples
- Timeliness: Update immediately after code changes
- Completeness: Cover all user-facing and developer-facing changes
- Consistency: Maintain same format across all documentation files
1. Development Workflow
Story-Driven Development (BMad Method)
All development follows the BMad Method epic and story workflow:
Current Development Status: READY FOR IMPLEMENTATION
- Epic 1: Foundation & Core YouTube Integration (Story 1.1 ✅ Complete, Stories 1.2-1.4 📋 Ready)
- Epic 2: AI Summarization Engine (Stories 2.1-2.5 📋 All Created and Ready)
- Epic 3: Enhanced User Experience (Future - Ready for story creation)
Developer Handoff Complete: All Epic 1 & 2 stories created with comprehensive Dev Notes.
- See Developer Handoff Guide for implementation start
- See Sprint Planning for 6-week development schedule
Story-Based Implementation Process
# 1. Start with Developer Handoff
cat docs/DEVELOPER_HANDOFF.md # Complete implementation guide
cat docs/SPRINT_PLANNING.md # Sprint breakdown
# 2. Get Your Next Story (All stories ready!)
# Sprint 1: Stories 1.2, 1.3, 1.4
# Sprint 2: Stories 2.1, 2.2, 2.3
# Sprint 3: Stories 2.4, 2.5
# 3. Review Story Implementation Requirements
# Read: docs/stories/{story-number}.{name}.md
# Example: docs/stories/1.2.youtube-url-validation-parsing.md
# Study: Dev Notes section with complete code examples
# Check: All tasks and subtasks with time estimates
# 4. Implement Story
# Option A: Use Development Agent
/BMad:agents:dev
# Follow story specifications exactly
# Option B: Direct implementation
# Use code examples from Dev Notes
# Follow file structure specified in story
# Implement tasks in order
# 5. Test Implementation (Comprehensive Test Runner)
./run_tests.sh run-unit --fail-fast # Ultra-fast feedback (229 tests)
./run_tests.sh run-specific "test_{module}.py" # Test specific modules
./run_tests.sh run-integration # Integration & API tests
./run_tests.sh run-all --coverage # Full validation with coverage
cd frontend && npm test
# 6. Server Restart Protocol (CRITICAL FOR BACKEND CHANGES)
# ALWAYS restart backend after modifying Python files:
./scripts/restart-backend.sh # After backend code changes
./scripts/restart-frontend.sh # After npm installs or config changes
./scripts/restart-both.sh # Full stack restart
# Frontend HMR handles React changes automatically - no restart needed
# 7. Update Story Progress
# In story file, mark tasks complete:
# - [x] **Task 1: Completed task**
# Update story status: Draft → In Progress → Review → Done
# 7. Move to Next Story
# Check Sprint Planning for next priority
# Repeat process with next story file
Alternative: Direct Development (Without BMad Agents)
# 1. Read current story specification
cat docs/stories/1.2.youtube-url-validation-parsing.md
# 2. Follow Dev Notes and architecture references
cat docs/architecture.md # Technical specifications
cat docs/front-end-spec.md # UI requirements
# 3. Implement systematically
# Follow tasks/subtasks exactly as specified
# Use provided code examples and patterns
# 4. Test and validate (Test Runner System)
./run_tests.sh run-unit --fail-fast # Fast feedback during development
./run_tests.sh run-all --coverage # Complete validation before story completion
cd frontend && npm test
Story Implementation Checklist (BMad Method)
-
Review Story Requirements
- Read complete story file (
docs/stories/{epic}.{story}.{name}.md) - Study Dev Notes section with architecture references
- Understand all acceptance criteria
- Review all tasks and subtasks
- Read complete story file (
-
Follow Architecture Specifications
- Reference
docs/architecture.mdfor technical patterns - Use exact file locations specified in story
- Follow error handling patterns from architecture
- Implement according to database schema specifications
- Reference
-
Write Tests First (TDD)
- Create unit tests based on story testing requirements
- Write integration tests for API endpoints
- Add frontend component tests where specified
- Ensure test coverage meets story requirements
-
Implement Features Systematically
- Complete tasks in order specified in story
- Follow code examples and patterns from Dev Notes
- Use exact imports and dependencies specified
- Implement error handling as architecturally defined
-
Validate Implementation
- All acceptance criteria met
- All tasks/subtasks completed
- Full test suite passes
- Integration testing successful
-
Update Story Progress
- Mark tasks complete in story markdown file
- Update story status from "Draft" to "Done"
- Add completion notes to Dev Agent Record section
- Update epic progress in
docs/prd/index.md
-
Commit Changes
- Use story-based commit message format
- Reference story number in commit
- Include brief implementation summary
FILE LENGTH - Keep All Files Modular and Focused
300 Lines of Code Limit
CRITICAL RULE: We must keep all files under 300 LOC.
- Current Status: Many files in our codebase break this rule
- Requirement: Files must be modular & single-purpose
- Enforcement: Before adding any significant functionality, check file length
- Action Required: Refactor any file approaching or exceeding 300 lines
# Check file lengths across project
find . -name "*.py" -not -path "*/venv*/*" -not -path "*/__pycache__/*" -exec wc -l {} + | awk '$1 > 300'
find . -name "*.ts" -name "*.tsx" -not -path "*/node_modules/*" -exec wc -l {} + | awk '$1 > 300'
Modularization Strategies:
- Extract utility functions into separate modules
- Split large classes into focused, single-responsibility classes
- Move constants and configuration to dedicated files
- Separate concerns: logic, data models, API handlers
- Use composition over inheritance to reduce file complexity
Examples of Files Needing Refactoring:
- Large service files → Split into focused service modules
- Complex API routers → Extract handlers to separate modules
- Monolithic components → Break into smaller, composable components
- Combined model files → Separate by entity or domain
READING FILES - Never Make Assumptions
Always Read Files in Full Before Changes
CRITICAL RULE: Always read the file in full, do not be lazy.
- Before making ANY code changes: Start by finding & reading ALL relevant files
- Never make changes without reading the entire file: Understand context, existing patterns, dependencies
- Read related files: Check imports, dependencies, and related modules
- Understand existing architecture: Follow established patterns and conventions
# Investigation checklist before any code changes:
# 1. Read the target file completely
# 2. Read all imported modules
# 3. Check related test files
# 4. Review configuration files
# 5. Understand data models and schemas
File Reading Protocol:
- Target File: Read entire file to understand current implementation
- Dependencies: Read all imported modules and their interfaces
- Tests: Check existing test coverage and patterns
- Related Files: Review files in same directory/module
- Configuration: Check relevant config files and environment variables
- Documentation: Read any related documentation or comments
Common Mistakes to Avoid:
- ❌ Making changes based on file names alone
- ❌ Assuming function behavior without reading implementation
- ❌ Not understanding existing error handling patterns
- ❌ Missing important configuration or environment dependencies
- ❌ Ignoring existing test patterns and coverage
EGO - Engineering Humility and Best Practices
Do Not Make Assumptions - Consider Multiple Approaches
CRITICAL MINDSET: Do not make assumptions. Do not jump to conclusions.
- Reality Check: You are just a Large Language Model, you are very limited
- Engineering Approach: Always consider multiple different approaches, just like a senior engineer
- Validate Assumptions: Test your understanding against the actual codebase
- Seek Understanding: When unclear, read more files and investigate thoroughly
Senior Engineer Mindset:
1. **Multiple Solutions**: Always consider 2-3 different approaches
2. **Trade-off Analysis**: Evaluate pros/cons of each approach
3. **Existing Patterns**: Follow established codebase patterns
4. **Future Maintenance**: Consider long-term maintainability
5. **Performance Impact**: Consider resource and performance implications
6. **Testing Strategy**: Plan testing approach before implementation
Before Implementation, Ask:
- What are 2-3 different ways to solve this?
- What are the trade-offs of each approach?
- How does this fit with existing architecture patterns?
- What could break if this implementation is wrong?
- How would a senior engineer approach this problem?
- What edge cases am I not considering?
Decision Process:
- Gather Information: Read all relevant files and understand context
- Generate Options: Consider multiple implementation approaches
- Evaluate Trade-offs: Analyze pros/cons of each option
- Check Patterns: Ensure consistency with existing codebase
- Plan Testing: Design test strategy to validate approach
- Implement Incrementally: Start small, verify, then expand
Remember Your Limitations:
- Cannot execute code to verify behavior
- Cannot access external documentation beyond what's provided
- Cannot make network requests or test integrations
- Cannot guarantee code will work without testing
- Limited understanding of complex business logic
Compensation Strategies:
- Read more files when uncertain
- Follow established patterns rigorously
- Provide multiple implementation options
- Document assumptions and limitations
- Suggest verification steps for humans
- Request feedback on complex architectural decisions
Class Library Integration and Usage
AI Assistant Class Library Reference
This project uses the shared AI Assistant Class Library (/lib/) which provides foundational components for AI applications. Always check the class library first before implementing common functionality.
Core Library Components Used:
Service Framework (/lib/services/):
from ai_assistant_lib import BaseService, BaseAIService, ServiceStatus
# Backend services inherit from library base classes
class VideoService(BaseService):
async def _initialize_impl(self) -> None:
# Service-specific initialization with lifecycle management
pass
class AnthropicSummarizer(BaseAIService):
# Inherits retry logic, caching, rate limiting from library
async def _make_prediction(self, request: AIRequest) -> AIResponse:
pass
Repository Pattern (/lib/data/repositories/):
from ai_assistant_lib import BaseRepository, TimestampedModel
# Database models use library base classes
class Summary(TimestampedModel):
# Automatic created_at, updated_at fields
__tablename__ = 'summaries'
class SummaryRepository(BaseRepository[Summary]):
# Inherits CRUD operations, filtering, pagination
async def find_by_video_id(self, video_id: str) -> Optional[Summary]:
filters = {"video_id": video_id}
results = await self.find_all(filters=filters, limit=1)
return results[0] if results else None
Error Handling (/lib/core/exceptions/):
from ai_assistant_lib import ServiceError, RetryableError, ValidationError
# Consistent error handling across the application
try:
result = await summarizer.generate_summary(transcript)
except RetryableError:
# Automatic retry handled by library
pass
except ValidationError as e:
raise HTTPException(status_code=400, detail=str(e))
Async Utilities (/lib/utils/helpers/):
from ai_assistant_lib import with_retry, with_cache, MemoryCache
# Automatic retry for external API calls
@with_retry(max_attempts=3)
async def extract_youtube_transcript(video_id: str) -> str:
# Implementation with automatic exponential backoff
pass
# Caching for expensive operations
cache = MemoryCache(max_size=1000, default_ttl=3600)
@with_cache(cache=cache, key_prefix="transcript")
async def get_cached_transcript(video_id: str) -> str:
# Expensive transcript extraction cached automatically
pass
Project-Specific Usage Patterns:
Backend API Services (backend/services/):
summary_pipeline.py- UsesBaseServicefor pipeline orchestrationanthropic_summarizer.py- ExtendsBaseAIServicefor AI integrationcache_manager.py- Uses library caching utilitiesvideo_service.py- Implements service framework patterns
Data Layer (backend/models/, backend/core/):
summary.py- UsesTimestampedModelfrom libraryuser.py- Inherits from library base modelsdatabase_registry.py- Extends library database patterns
API Layer (backend/api/):
- Exception handling uses library error hierarchy
- Request/response models extend library schemas
- Dependency injection follows library patterns
Library Integration Checklist:
Before implementing new functionality:
- Check Library First: Review
/lib/for existing solutions - Follow Patterns: Use established library patterns and base classes
- Extend, Don't Duplicate: Extend library classes instead of creating from scratch
- Error Handling: Use library exception hierarchy for consistency
- Testing: Use library test utilities and patterns
Common Integration Patterns:
# Service initialization with library framework
async def create_service() -> VideoService:
service = VideoService("video_processor")
await service.initialize() # Lifecycle managed by BaseService
return service
# Repository operations with library patterns
async def get_summary_data(video_id: str) -> Optional[Summary]:
repo = SummaryRepository(session, Summary)
return await repo.find_by_video_id(video_id)
# AI service with library retry and caching
summarizer = AnthropicSummarizer(
api_key=settings.ANTHROPIC_API_KEY,
cache_manager=cache_manager, # From library
retry_config=RetryConfig(max_attempts=3) # From library
)
2. Code Standards
Python Style Guide
"""
Module docstring describing purpose and usage
"""
from typing import List, Optional, Dict, Any
import asyncio
from datetime import datetime
# Constants in UPPER_CASE
DEFAULT_TIMEOUT = 30
MAX_RETRIES = 3
class YouTubeSummarizer:
"""
Class for summarizing YouTube videos.
Attributes:
model: AI model to use for summarization
cache: Cache service instance
"""
def __init__(self, model: str = "openai"):
"""Initialize summarizer with specified model."""
self.model = model
self.cache = CacheService()
async def summarize(
self,
video_url: str,
options: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Summarize a YouTube video.
Args:
video_url: YouTube video URL
options: Optional summarization parameters
Returns:
Dictionary containing summary and metadata
Raises:
YouTubeError: If video cannot be accessed
AIServiceError: If summarization fails
"""
# Implementation here
pass
Type Hints
Always use type hints for better code quality:
from typing import Union, List, Optional, Dict, Any, Tuple
from pydantic import BaseModel, HttpUrl
async def process_video(
url: HttpUrl,
models: List[str],
max_length: Optional[int] = None
) -> Tuple[str, Dict[str, Any]]:
"""Process video with type safety."""
pass
Async/Await Pattern
Use async for all I/O operations:
async def fetch_transcript(video_id: str) -> str:
"""Fetch transcript asynchronously."""
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
# Use asyncio.gather for parallel operations
results = await asyncio.gather(
fetch_transcript(id1),
fetch_transcript(id2),
fetch_transcript(id3)
)
3. Testing Requirements
Test Runner System
The project includes a production-ready test runner system with 229 discovered unit tests and intelligent test categorization.
# Primary Testing Commands
./run_tests.sh run-unit --fail-fast # Ultra-fast feedback (0.2s discovery)
./run_tests.sh run-all --coverage # Complete test suite
./run_tests.sh run-integration # Integration & API tests
cd frontend && npm test # Frontend tests
Test Coverage Requirements
- Minimum 80% code coverage
- 100% coverage for critical paths
- All edge cases tested
- Error conditions covered
📖 Complete Testing Guide: See TESTING-INSTRUCTIONS.md for comprehensive testing standards, procedures, examples, and troubleshooting.
4. Documentation Standards
Code Documentation
Every module, class, and function must have docstrings:
"""
Module: YouTube Transcript Extractor
This module provides functionality to extract transcripts from YouTube videos
using multiple fallback methods.
Example:
>>> extractor = TranscriptExtractor()
>>> transcript = await extractor.extract("video_id")
"""
def extract_transcript(
video_id: str,
language: str = "en",
include_auto_generated: bool = True
) -> List[Dict[str, Any]]:
"""
Extract transcript from YouTube video.
This function attempts to extract transcripts using the following priority:
1. Manual captions in specified language
2. Auto-generated captions if allowed
3. Translated captions as fallback
Args:
video_id: YouTube video identifier
language: ISO 639-1 language code (default: "en")
include_auto_generated: Whether to use auto-generated captions
Returns:
List of transcript segments with text, start time, and duration
Raises:
TranscriptNotAvailable: If no transcript can be extracted
Example:
>>> transcript = extract_transcript("dQw4w9WgXcQ", "en")
>>> print(transcript[0])
{"text": "Never gonna give you up", "start": 0.0, "duration": 3.5}
"""
pass
API Documentation
Use FastAPI's automatic documentation features:
from fastapi import APIRouter, HTTPException, status
from pydantic import BaseModel, Field
router = APIRouter()
class SummarizeRequest(BaseModel):
"""Request model for video summarization."""
url: str = Field(
...,
description="YouTube video URL",
example="https://youtube.com/watch?v=dQw4w9WgXcQ"
)
model: str = Field(
"auto",
description="AI model to use (openai, anthropic, deepseek, auto)",
example="openai"
)
max_length: Optional[int] = Field(
None,
description="Maximum summary length in words",
ge=50,
le=5000
)
@router.post(
"/summarize",
response_model=SummarizeResponse,
status_code=status.HTTP_200_OK,
summary="Summarize YouTube Video",
description="Submit a YouTube video URL for AI-powered summarization"
)
async def summarize_video(request: SummarizeRequest):
"""
Summarize a YouTube video using AI.
This endpoint accepts a YouTube URL and returns a job ID for tracking
the summarization progress. Use the /summary/{job_id} endpoint to
retrieve the completed summary.
"""
pass
5. Git Workflow
Branch Naming
# Feature branches
feature/task-2-youtube-extraction
feature/task-3-ai-summarization
# Bugfix branches
bugfix/transcript-encoding-error
bugfix/rate-limit-handling
# Hotfix branches
hotfix/critical-api-error
Commit Messages
Follow conventional commits:
# Format: <type>(<scope>): <subject>
# Examples:
feat(youtube): add transcript extraction service
fix(api): handle rate limiting correctly
docs(readme): update installation instructions
test(youtube): add edge case tests
refactor(cache): optimize cache key generation
perf(summarizer): implement parallel processing
chore(deps): update requirements.txt
Pull Request Template
## Task Reference
- Task ID: #3
- Task Title: Develop AI Summary Generation Service
## Description
Brief description of changes made
## Changes Made
- [ ] Implemented YouTube transcript extraction
- [ ] Added multi-model AI support
- [ ] Created caching layer
- [ ] Added comprehensive tests
## Testing
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing completed
- [ ] Coverage > 80%
## Documentation
- [ ] Code documented
- [ ] API docs updated
- [ ] README updated if needed
## Screenshots (if applicable)
[Add screenshots here]
6. API Design Standards
RESTful Principles
# Good API design
GET /api/summaries # List all summaries
GET /api/summaries/{id} # Get specific summary
POST /api/summaries # Create new summary
PUT /api/summaries/{id} # Update summary
DELETE /api/summaries/{id} # Delete summary
# Status codes
200 OK # Successful GET/PUT
201 Created # Successful POST
202 Accepted # Processing async request
204 No Content # Successful DELETE
400 Bad Request # Invalid input
401 Unauthorized # Missing/invalid auth
403 Forbidden # No permission
404 Not Found # Resource doesn't exist
429 Too Many Requests # Rate limited
500 Internal Error # Server error
Response Format
# Success response
{
"success": true,
"data": {
"id": "uuid",
"video_id": "abc123",
"summary": "...",
"metadata": {}
},
"timestamp": "2025-01-25T10:00:00Z"
}
# Error response
{
"success": false,
"error": {
"code": "TRANSCRIPT_NOT_AVAILABLE",
"message": "Could not extract transcript from video",
"details": "No captions available in requested language"
},
"timestamp": "2025-01-25T10:00:00Z"
}
Pagination
@router.get("/summaries")
async def list_summaries(
page: int = Query(1, ge=1),
limit: int = Query(20, ge=1, le=100),
sort: str = Query("created_at", regex="^(created_at|updated_at|title)$"),
order: str = Query("desc", regex="^(asc|desc)$")
):
"""List summaries with pagination."""
return {
"data": summaries,
"pagination": {
"page": page,
"limit": limit,
"total": total_count,
"pages": math.ceil(total_count / limit)
}
}
7. Database Operations
SQLAlchemy Models
from sqlalchemy import Column, String, Text, DateTime, Float, JSON
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import UUID
import uuid
Base = declarative_base()
class Summary(Base):
__tablename__ = "summaries"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
video_id = Column(String(20), nullable=False, index=True)
video_url = Column(Text, nullable=False)
video_title = Column(Text)
transcript = Column(Text)
summary = Column(Text)
key_points = Column(JSON)
chapters = Column(JSON)
model_used = Column(String(50))
processing_time = Column(Float)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
def to_dict(self):
"""Convert to dictionary for API responses."""
return {
"id": str(self.id),
"video_id": self.video_id,
"video_title": self.video_title,
"summary": self.summary,
"key_points": self.key_points,
"chapters": self.chapters,
"model_used": self.model_used,
"created_at": self.created_at.isoformat()
}
Database Migrations
Use Alembic for migrations:
# Create new migration
alembic revision --autogenerate -m "Add chapters column"
# Apply migrations
alembic upgrade head
# Rollback
alembic downgrade -1
Query Optimization
from sqlalchemy import select, and_
from sqlalchemy.orm import selectinload
# Efficient querying with joins
async def get_summaries_with_metadata(session, user_id: str):
stmt = (
select(Summary)
.options(selectinload(Summary.metadata))
.where(Summary.user_id == user_id)
.order_by(Summary.created_at.desc())
.limit(10)
)
result = await session.execute(stmt)
return result.scalars().all()
8. Performance Guidelines
Caching Strategy
from functools import lru_cache
import redis
import hashlib
import json
class CacheService:
def __init__(self):
self.redis = redis.Redis(decode_responses=True)
self.ttl = 3600 # 1 hour default
def get_key(self, prefix: str, **kwargs) -> str:
"""Generate cache key from parameters."""
data = json.dumps(kwargs, sort_keys=True)
hash_digest = hashlib.md5(data.encode()).hexdigest()
return f"{prefix}:{hash_digest}"
async def get_or_set(self, key: str, func, ttl: int = None):
"""Get from cache or compute and set."""
# Try cache first
cached = self.redis.get(key)
if cached:
return json.loads(cached)
# Compute result
result = await func()
# Cache result
self.redis.setex(
key,
ttl or self.ttl,
json.dumps(result)
)
return result
Async Processing
from celery import Celery
from typing import Dict, Any
celery_app = Celery('youtube_summarizer')
@celery_app.task
async def process_video_task(video_url: str, options: Dict[str, Any]):
"""Background task for video processing."""
try:
# Extract transcript
transcript = await extract_transcript(video_url)
# Generate summary
summary = await generate_summary(transcript, options)
# Save to database
await save_summary(video_url, summary)
return {"status": "completed", "summary_id": summary.id}
except Exception as e:
return {"status": "failed", "error": str(e)}
Performance Monitoring
import time
from functools import wraps
import logging
logger = logging.getLogger(__name__)
def measure_performance(func):
"""Decorator to measure function performance."""
@wraps(func)
async def wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = await func(*args, **kwargs)
elapsed = time.perf_counter() - start
logger.info(f"{func.__name__} took {elapsed:.3f}s")
return result
except Exception as e:
elapsed = time.perf_counter() - start
logger.error(f"{func.__name__} failed after {elapsed:.3f}s: {e}")
raise
return wrapper
9. Security Protocols
Input Validation
from pydantic import BaseModel, validator, HttpUrl
import re
class VideoURLValidator(BaseModel):
url: HttpUrl
@validator('url')
def validate_youtube_url(cls, v):
youtube_regex = re.compile(
r'(https?://)?(www\.)?(youtube\.com|youtu\.be)/.+'
)
if not youtube_regex.match(str(v)):
raise ValueError('Invalid YouTube URL')
return v
API Key Management
from pydantic import BaseSettings
class Settings(BaseSettings):
"""Application settings with validation."""
# API Keys (never hardcode!)
openai_api_key: str
anthropic_api_key: str
youtube_api_key: Optional[str] = None
# Security
secret_key: str
allowed_origins: List[str] = ["http://localhost:3000"]
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
case_sensitive = False
settings = Settings()
Rate Limiting
from fastapi import Request, HTTPException
from fastapi_limiter import FastAPILimiter
from fastapi_limiter.depends import RateLimiter
import redis.asyncio as redis
# Initialize rate limiter
async def init_rate_limiter():
redis_client = redis.from_url("redis://localhost:6379", encoding="utf-8", decode_responses=True)
await FastAPILimiter.init(redis_client)
# Apply rate limiting
@router.post("/summarize", dependencies=[Depends(RateLimiter(times=10, seconds=60))])
async def summarize_video(request: SummarizeRequest):
"""Rate limited to 10 requests per minute."""
pass
10. Deployment Process
Docker Configuration
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8082"]
Environment Management
# .env.development
DEBUG=true
DATABASE_URL=sqlite:///./dev.db
LOG_LEVEL=DEBUG
# .env.production
DEBUG=false
DATABASE_URL=postgresql://user:pass@db:5432/youtube_summarizer
LOG_LEVEL=INFO
Health Checks
@router.get("/health")
async def health_check():
"""Health check endpoint for monitoring."""
checks = {
"api": "healthy",
"database": await check_database(),
"cache": await check_cache(),
"ai_service": await check_ai_service()
}
all_healthy = all(v == "healthy" for v in checks.values())
return {
"status": "healthy" if all_healthy else "degraded",
"checks": checks,
"timestamp": datetime.utcnow().isoformat()
}
Monitoring
from prometheus_client import Counter, Histogram, generate_latest
# Metrics
request_count = Counter('youtube_requests_total', 'Total requests')
request_duration = Histogram('youtube_request_duration_seconds', 'Request duration')
summary_generation_time = Histogram('summary_generation_seconds', 'Summary generation time')
@router.get("/metrics")
async def metrics():
"""Prometheus metrics endpoint."""
return Response(generate_latest(), media_type="text/plain")
Agent-Specific Instructions
For AI Agents
When working on this codebase:
- Always check Task Master first:
task-master next - Follow TDD: Write tests before implementation
- Use type hints: All functions must have type annotations
- Document changes: Update docstrings and comments
- Test thoroughly: Run full test suite before marking complete
- Update task status: Keep Task Master updated with progress
Quality Checklist
Before marking any task as complete:
- All tests pass (
./run_tests.sh run-all) - Code coverage > 80% (
./run_tests.sh run-all --coverage) - Unit tests pass with fast feedback (
./run_tests.sh run-unit --fail-fast) - Integration tests validated (
./run_tests.sh run-integration) - Frontend tests pass (
cd frontend && npm test) - No linting errors (
ruff check src/) - Type checking passes (
mypy src/) - Documentation updated
- Task Master updated
- Changes committed with proper message
📖 Testing Details: See TESTING-INSTRUCTIONS.md for complete testing procedures and standards.
Conclusion
This guide ensures consistent, high-quality development across all contributors to the YouTube Summarizer project. Follow these standards to maintain code quality, performance, and security.
Last Updated: 2025-01-25 Version: 1.0.0