21 KiB
CLAUDE.md - YouTube Summarizer Backend
This file provides guidance to Claude Code when working with the YouTube Summarizer backend services.
Backend Architecture Overview
The backend is built with FastAPI and follows a clean architecture pattern with clear separation of concerns:
backend/
├── api/ # API endpoints and request/response models
├── services/ # Business logic and external integrations
├── models/ # Data models and database schemas
├── core/ # Core utilities, exceptions, and configurations
└── tests/ # Unit and integration tests
Key Services and Components
Authentication System (Story 3.1 - COMPLETE ✅)
Architecture: Production-ready JWT-based authentication with Database Registry singleton pattern
AuthService (services/auth_service.py)
- JWT token generation and validation (access + refresh tokens)
- Password hashing with bcrypt and strength validation
- User registration with email verification workflow
- Password reset with secure token generation
- Session management and token refresh logic
Database Registry Pattern (core/database_registry.py)
- CRITICAL FIX: Resolves SQLAlchemy "Multiple classes found for path" errors
- Singleton pattern ensuring single Base instance across application
- Automatic model registration preventing table redefinition conflicts
- Thread-safe model management with registry cleanup for testing
- Production-ready architecture preventing relationship resolver issues
Authentication Models (models/user.py)
- User, RefreshToken, APIKey, EmailVerificationToken, PasswordResetToken
- Fully qualified relationship paths preventing SQLAlchemy conflicts
- String UUID fields for SQLite compatibility
- Proper model inheritance using Database Registry Base
Authentication API (api/auth.py)
- Complete endpoint coverage: register, login, logout, refresh, verify email, reset password
- Comprehensive input validation and error handling
- Protected route dependencies and middleware
- Async/await patterns throughout
Dual Transcript Services ✅ NEW
DualTranscriptService (services/dual_transcript_service.py)
- Orchestrates between YouTube captions and Whisper AI transcription
- Supports three extraction modes:
youtube,whisper,both - Parallel processing for comparison mode with real-time progress updates
- Advanced quality comparison with punctuation/capitalization analysis
- Processing time estimation and intelligent recommendation engine
- Seamless integration with existing TranscriptService
FasterWhisperTranscriptService (services/faster_whisper_transcript_service.py) ✅ UPGRADED
- 20-32x Speed Improvement: Powered by faster-whisper (CTranslate2 optimization engine)
- Large-v3-Turbo Model: Best accuracy/speed balance with advanced AI capabilities
- Intelligent Optimizations: Voice Activity Detection (VAD), int8 quantization, GPU acceleration
- Native MP3 Support: No audio conversion needed, direct processing
- Advanced Configuration: Fully configurable via VideoDownloadConfig with environment variables
- Production Features: Async processing, intelligent chunking, comprehensive metadata
- Performance Metrics: Real-time speed ratios, processing time tracking, quality scoring
Core Pipeline Services
IntelligentVideoDownloader (services/intelligent_video_downloader.py) ✅ NEW
- 9-Tier Transcript Extraction Fallback Chain:
- YouTube Transcript API - Primary method using official API
- Auto-generated Captions - YouTube's automatic captions fallback
- Whisper AI Transcription - OpenAI Whisper for high-quality audio transcription
- PyTubeFix Downloader - Alternative YouTube library
- YT-DLP Downloader - Robust video/audio extraction tool
- Playwright Browser - Browser automation for JavaScript-rendered content
- External Tools - 4K Video Downloader CLI integration
- Web Services - Third-party transcript API services
- Transcript-Only - Metadata without full transcript as final fallback
- Audio Retention System for re-transcription capability
- Intelligent method selection based on success rates
- Comprehensive error handling with detailed logging
- Performance telemetry and health monitoring
SummaryPipeline (services/summary_pipeline.py)
- Main orchestration service for end-to-end video processing
- 7-stage async pipeline: URL validation → metadata extraction → transcript → analysis → summarization → quality validation → completion
- Integrates with IntelligentVideoDownloader for robust transcript extraction
- Intelligent content analysis and configuration optimization
- Real-time progress tracking via WebSocket
- Automatic retry logic with exponential backoff
- Quality scoring and validation system
AnthropicSummarizer (services/anthropic_summarizer.py)
- AI service integration using Claude 3.5 Haiku for cost efficiency
- Structured JSON output with fallback text parsing
- Token counting and cost estimation
- Intelligent chunking for long transcripts (up to 200k context)
- Comprehensive error handling and retry logic
CacheManager (services/cache_manager.py)
- Multi-level caching for pipeline results, transcripts, and metadata
- TTL-based expiration with automatic cleanup
- Redis-ready architecture for production scaling
- Configurable cache keys with collision prevention
WebSocketManager (core/websocket_manager.py)
- Singleton pattern for WebSocket connection management
- Job-specific connection tracking and broadcasting
- Real-time progress updates and completion notifications
- Heartbeat mechanism and stale connection cleanup
NotificationService (services/notification_service.py)
- Multi-type notifications (completion, error, progress, system)
- Notification history and statistics tracking
- Email/webhook integration ready architecture
- Configurable filtering and management
API Layer
Pipeline API (api/pipeline.py)
- Complete pipeline management endpoints
- Process video with configuration options
- Status monitoring and job history
- Pipeline cancellation and cleanup
- Health checks and system statistics
Summarization API (api/summarization.py)
- Direct AI summarization endpoints
- Sync and async processing options
- Cost estimation and validation
- Background job management
Dual Transcript API (api/transcripts.py) ✅ NEW
POST /api/transcripts/dual/extract- Start dual transcript extractionGET /api/transcripts/dual/jobs/{job_id}- Monitor extraction progressPOST /api/transcripts/dual/estimate- Get processing time estimatesGET /api/transcripts/dual/compare/{video_id}- Force comparison analysis- Background job processing with real-time progress updates
- YouTube captions, Whisper AI, or both sources simultaneously
Development Patterns
Service Dependency Injection
def get_summary_pipeline(
video_service: VideoService = Depends(get_video_service),
transcript_service: TranscriptService = Depends(get_transcript_service),
ai_service: AnthropicSummarizer = Depends(get_ai_service),
cache_manager: CacheManager = Depends(get_cache_manager),
notification_service: NotificationService = Depends(get_notification_service)
) -> SummaryPipeline:
return SummaryPipeline(...)
Database Registry Pattern (CRITICAL ARCHITECTURE)
Problem Solved: SQLAlchemy "Multiple classes found for path" relationship resolver errors
# Always use the registry for model creation
from backend.core.database_registry import registry
from backend.models.base import Model
# Models inherit from Model (which uses registry.Base)
class User(Model):
__tablename__ = "users"
# Use fully qualified relationship paths to prevent conflicts
summaries = relationship("backend.models.summary.Summary", back_populates="user")
# Registry ensures single Base instance and safe model registration
registry.create_all_tables(engine) # For table creation
registry.register_model(ModelClass) # Automatic via BaseModel mixin
Key Benefits:
- Prevents SQLAlchemy table redefinition conflicts
- Thread-safe singleton pattern
- Automatic model registration and deduplication
- Production-ready architecture
- Clean testing with registry reset capabilities
Authentication Pattern
# Protected endpoint with user dependency
@router.post("/api/protected")
async def protected_endpoint(
current_user: User = Depends(get_current_user),
db: Session = Depends(get_db)
):
return {"user_id": current_user.id}
# JWT token validation and refresh
from backend.services.auth_service import AuthService
auth_service = AuthService()
user = await auth_service.authenticate_user(email, password)
tokens = auth_service.create_access_token(user)
Async Pipeline Pattern
async def process_video(self, video_url: str, config: PipelineConfig = None) -> str:
job_id = str(uuid.uuid4())
result = PipelineResult(job_id=job_id, video_url=video_url, ...)
self.active_jobs[job_id] = result
# Start background processing
asyncio.create_task(self._execute_pipeline(job_id, config))
return job_id
Error Handling Pattern
try:
result = await self.ai_service.generate_summary(request)
except AIServiceError as e:
raise HTTPException(status_code=500, detail={
"error": "AI service error",
"message": e.message,
"code": e.error_code
})
Configuration and Environment
Required Environment Variables
# Core Services
ANTHROPIC_API_KEY=sk-ant-... # Required for AI summarization
YOUTUBE_API_KEY=AIza... # YouTube Data API v3 key
GOOGLE_API_KEY=AIza... # Google/Gemini API key
# Feature Flags
USE_MOCK_SERVICES=false # Disable mock services
ENABLE_REAL_TRANSCRIPT_EXTRACTION=true # Enable real transcript extraction
# Video Download & Storage Configuration
VIDEO_DOWNLOAD_STORAGE_PATH=./video_storage # Base storage directory
VIDEO_DOWNLOAD_KEEP_AUDIO_FILES=true # Save audio for re-transcription
VIDEO_DOWNLOAD_AUDIO_CLEANUP_DAYS=30 # Audio retention period
VIDEO_DOWNLOAD_MAX_STORAGE_GB=10 # Storage limit
# Faster-Whisper Configuration (20-32x Speed Improvement)
VIDEO_DOWNLOAD_WHISPER_MODEL=large-v3-turbo # Model: 'large-v3-turbo', 'large-v3', 'medium', 'small', 'base'
VIDEO_DOWNLOAD_WHISPER_DEVICE=auto # Device: 'auto', 'cpu', 'cuda'
VIDEO_DOWNLOAD_WHISPER_COMPUTE_TYPE=auto # Compute: 'auto', 'int8', 'float16', 'float32'
VIDEO_DOWNLOAD_WHISPER_BEAM_SIZE=5 # Beam search size (1-10, higher = better quality)
VIDEO_DOWNLOAD_WHISPER_VAD_FILTER=true # Voice Activity Detection (efficiency)
VIDEO_DOWNLOAD_WHISPER_WORD_TIMESTAMPS=true # Word-level timestamps
VIDEO_DOWNLOAD_WHISPER_TEMPERATURE=0.0 # Sampling temperature (0 = deterministic)
VIDEO_DOWNLOAD_WHISPER_BEST_OF=5 # Number of candidates when sampling
# Dependencies: faster-whisper automatically handles dependencies
# pip install faster-whisper torch pydub yt-dlp pytubefix
# GPU acceleration: CUDA automatically detected and used when available
# Optional Configuration
DATABASE_URL=sqlite:///./data/app.db # Database connection
REDIS_URL=redis://localhost:6379/0 # Cache backend (optional)
LOG_LEVEL=INFO # Logging level
CORS_ORIGINS=http://localhost:3000 # Frontend origins
Service Configuration
Services are configured through dependency injection with sensible defaults:
# Cost-optimized AI model
ai_service = AnthropicSummarizer(
api_key=api_key,
model="claude-3-5-haiku-20241022" # Cost-effective choice
)
# Cache with TTL
cache_manager = CacheManager(default_ttl=3600) # 1 hour default
# Pipeline with retry logic
config = PipelineConfig(
summary_length="standard",
quality_threshold=0.7,
max_retries=2,
enable_notifications=True
)
Testing Strategy
Unit Tests
- Location:
tests/unit/ - Coverage: 17+ tests for pipeline orchestration
- Mocking: All external services mocked
- Patterns: Async test patterns with proper fixtures
Integration Tests
- Location:
tests/integration/ - Coverage: 20+ API endpoint scenarios
- Testing: Full FastAPI integration with TestClient
- Validation: Request/response validation and error handling
Running Tests
# From backend directory
PYTHONPATH=/path/to/youtube-summarizer python3 -m pytest tests/unit/ -v
PYTHONPATH=/path/to/youtube-summarizer python3 -m pytest tests/integration/ -v
# With coverage
python3 -m pytest tests/ --cov=backend --cov-report=html
Common Development Tasks
Adding New API Endpoints
- Create endpoint in appropriate
api/module - Add business logic to
services/layer - Update
main.pyto include router - Add unit and integration tests
- Update API documentation
Adding New Services
- Create service class in
services/ - Implement proper async patterns
- Add error handling with custom exceptions
- Create dependency injection function
- Add comprehensive unit tests
Debugging Pipeline Issues
# Enable detailed logging
import logging
logging.getLogger("backend").setLevel(logging.DEBUG)
# Check pipeline status
pipeline = get_summary_pipeline()
result = await pipeline.get_pipeline_result(job_id)
print(f"Status: {result.status}, Error: {result.error}")
# Monitor active jobs
active_jobs = pipeline.get_active_jobs()
print(f"Active jobs: {len(active_jobs)}")
Performance Optimization
Faster-Whisper Performance (✅ MAJOR UPGRADE)
- 20-32x Speed Improvement: CTranslate2 optimization engine provides massive speed gains
- Large-v3-Turbo Model: Combines best accuracy with 5-8x additional speed over large-v3
- Intelligent Processing: Voice Activity Detection reduces processing time by filtering silence
- CPU Optimization: int8 quantization provides excellent performance even without GPU
- GPU Acceleration: Automatic CUDA detection and utilization when available
- Native MP3: Direct processing without audio conversion overhead
- Real-time Performance: Typical 2-3x faster than realtime processing speeds
Benchmark Results (3.6 minute video):
- Processing Time: 94 seconds (vs ~30+ minutes with OpenAI Whisper)
- Quality Score: 1.000 (perfect transcription accuracy)
- Confidence Score: 0.962 (very high confidence)
- Speed Ratio: 2.3x faster than realtime
Async Patterns
- All I/O operations use async/await
- Background tasks for long-running operations
- Connection pooling for external services
- Proper exception handling to prevent blocking
Caching Strategy
- Pipeline results cached for 1 hour
- Transcript and metadata cached separately
- Cache invalidation on video updates
- Redis-ready for distributed caching
Cost Optimization
- Claude 3.5 Haiku for 80% cost savings vs GPT-4
- Intelligent chunking prevents token waste
- Cost estimation and limits
- Quality scoring to avoid unnecessary retries
Security Considerations
API Security
- Environment variable for API keys
- Input validation on all endpoints
- Rate limiting (implement with Redis)
- CORS configuration for frontend origins
Error Sanitization
# Never expose internal errors to clients
except Exception as e:
logger.error(f"Internal error: {e}")
raise HTTPException(status_code=500, detail="Internal server error")
Content Validation
# Validate transcript length
if len(request.transcript.strip()) < 50:
raise HTTPException(status_code=400, detail="Transcript too short")
Monitoring and Observability
Health Checks
/api/health- Service health status/api/stats- Pipeline processing statistics- WebSocket connection monitoring
- Background job tracking
Logging
- Structured logging with JSON format
- Error tracking with context
- Performance metrics logging
- Request/response logging (without sensitive data)
Metrics
# Built-in metrics
stats = {
"active_jobs": len(pipeline.get_active_jobs()),
"cache_stats": await cache_manager.get_cache_stats(),
"notification_stats": notification_service.get_notification_stats(),
"websocket_connections": websocket_manager.get_stats()
}
Deployment Considerations
Production Configuration
- Use Redis for caching and session storage
- Configure proper logging (structured JSON)
- Set up health checks and monitoring
- Use environment-specific configuration
- Enable HTTPS and security headers
Scaling Patterns
- Stateless design enables horizontal scaling
- Background job processing via task queue
- Database connection pooling
- Load balancer health checks
Database Migrations & Epic 4 Features
Current Status: ✅ Epic 4 migration complete (add_epic_4_features)
Database Schema: 21 tables including Epic 4 features:
- Multi-Agent Tables:
agent_summaries,prompt_templates - Enhanced Export Tables:
export_metadata,summary_sections - RAG Chat Tables:
chat_sessions,chat_messages,video_chunks - Analytics Tables:
playlist_analysis,rag_analytics,prompt_experiments
Migration Commands:
# Check migration status
python3 ../../scripts/utilities/migration_manager.py status
# Apply migrations (from backend directory)
PYTHONPATH=/Users/enias/projects/my-ai-projects/apps/youtube-summarizer \
../venv/bin/python3 -m alembic upgrade head
# Create new migration
python3 -m alembic revision --autogenerate -m "Add new feature"
Python 3.11 Requirement: Epic 4 requires Python 3.11+ for:
chromadb: Vector database for RAG functionalitysentence-transformers: Embedding generation for semantic searchaiohttp: Async HTTP client for DeepSeek API integration
Environment Setup:
# Remove old environment if needed
rm -rf venv
# Create Python 3.11 virtual environment
/opt/homebrew/bin/python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Install Epic 4 dependencies
pip install chromadb sentence-transformers aiohttp
# Verify installation
python --version # Should show Python 3.11.x
Troubleshooting
Common Issues
"Pydantic validation error: Extra inputs are not permitted"
- Issue: Environment variables not defined in Settings model
- Solution: Add
extra = "ignore"to Config class incore/config.py
"Table already exists" during migration
- Issue: Database already has tables that migration tries to create
- Solution: Use
alembic stamp existing_revisionthenalembic upgrade head
"Multiple head revisions present"
- Issue: Multiple migration branches need merging
- Solution: Use
alembic merge head1 head2 -m "Merge branches"
"Python 3.9 compatibility issues with Epic 4"
- Issue: ChromaDB and modern AI libraries require Python 3.11+
- Solution: Recreate virtual environment with Python 3.11 (see Environment Setup above)
"Anthropic API key not configured"
- Solution: Set
ANTHROPIC_API_KEYenvironment variable
"Mock data returned instead of real transcripts"
- Check:
USE_MOCK_SERVICES=falsein .env - Solution: Set
ENABLE_REAL_TRANSCRIPT_EXTRACTION=true
"404 Not Found for /api/transcripts/extract"
- Check: Import statements in main.py
- Solution: Use
from backend.api.transcripts import router(not transcripts_stub)
"Radio button selection not working"
- Issue: Circular state updates in React
- Solution: Use ref tracking in useTranscriptSelector hook
"VAD filter removes all audio / 0 segments generated"
- Issue: Voice Activity Detection too aggressive for music/instrumental content
- Solution: Set
VIDEO_DOWNLOAD_WHISPER_VAD_FILTER=falsefor music videos - Alternative: Use
whisper_vad_filter=Falsein service configuration
"Faster-whisper model download fails"
- Issue: Network issues downloading large-v3-turbo model from HuggingFace
- Solution: Model will automatically fallback to standard large-v3
- Check: Ensure internet connection for initial model download
"CPU transcription too slow"
- Issue: CPU-only processing on large models
- Solution: Use smaller model (
baseorsmall) or enable GPU acceleration - Config:
VIDEO_DOWNLOAD_WHISPER_MODEL=basefor faster CPU processing
Pipeline jobs stuck in "processing" state
- Check:
pipeline.get_active_jobs()for zombie jobs - Solution: Restart service or call cleanup endpoint
WebSocket connections not receiving updates
- Check: WebSocket connection in browser dev tools
- Solution: Verify WebSocket manager singleton initialization
High AI costs
- Check: Summary length configuration and transcript sizes
- Solution: Implement cost limits and brief summary defaults
Transcript extraction failures
- Check: IntelligentVideoDownloader fallback chain logs
- Solution: Review which tier failed and check API keys/dependencies
Debug Commands
# Pipeline debugging
from backend.services.summary_pipeline import SummaryPipeline
pipeline = SummaryPipeline(...)
result = await pipeline.get_pipeline_result("job_id")
# Cache debugging
from backend.services.cache_manager import CacheManager
cache = CacheManager()
stats = await cache.get_cache_stats()
# WebSocket debugging
from backend.core.websocket_manager import websocket_manager
connections = websocket_manager.get_stats()
This backend is designed for production use with comprehensive error handling, monitoring, and scalability patterns. All services follow async patterns and clean architecture principles.