25 KiB
AGENTS.md - YouTube Summarizer Development Standards
This document defines development workflows, standards, and best practices for the YouTube Summarizer project. It serves as a guide for both human developers and AI agents working on this codebase.
🚀 Quick Start for Developers
All stories are created and ready for implementation!
- Start Here: Developer Handoff Guide
- Sprint Plan: Sprint Planning Document
- First Story: Story 1.2 - URL Validation
Total Implementation Time: ~6 weeks (3 sprints)
- Sprint 1: Epic 1 (Foundation) - Stories 1.2-1.4
- Sprint 2: Epic 2 Core - Stories 2.1-2.3
- Sprint 3: Epic 2 Advanced - Stories 2.4-2.5
Table of Contents
- Development Workflow
- Code Standards
- Testing Requirements
- Documentation Standards
- Git Workflow
- API Design Standards
- Database Operations
- Performance Guidelines
- Security Protocols
- Deployment Process
1. Development Workflow
Story-Driven Development (BMad Method)
All development follows the BMad Method epic and story workflow:
Current Development Status: READY FOR IMPLEMENTATION
- Epic 1: Foundation & Core YouTube Integration (Story 1.1 ✅ Complete, Stories 1.2-1.4 📋 Ready)
- Epic 2: AI Summarization Engine (Stories 2.1-2.5 📋 All Created and Ready)
- Epic 3: Enhanced User Experience (Future - Ready for story creation)
Developer Handoff Complete: All Epic 1 & 2 stories created with comprehensive Dev Notes.
- See Developer Handoff Guide for implementation start
- See Sprint Planning for 6-week development schedule
Story-Based Implementation Process
# 1. Start with Developer Handoff
cat docs/DEVELOPER_HANDOFF.md # Complete implementation guide
cat docs/SPRINT_PLANNING.md # Sprint breakdown
# 2. Get Your Next Story (All stories ready!)
# Sprint 1: Stories 1.2, 1.3, 1.4
# Sprint 2: Stories 2.1, 2.2, 2.3
# Sprint 3: Stories 2.4, 2.5
# 3. Review Story Implementation Requirements
# Read: docs/stories/{story-number}.{name}.md
# Example: docs/stories/1.2.youtube-url-validation-parsing.md
# Study: Dev Notes section with complete code examples
# Check: All tasks and subtasks with time estimates
# 4. Implement Story
# Option A: Use Development Agent
/BMad:agents:dev
# Follow story specifications exactly
# Option B: Direct implementation
# Use code examples from Dev Notes
# Follow file structure specified in story
# Implement tasks in order
# 5. Test Implementation
pytest backend/tests/unit/test_{module}.py
pytest backend/tests/integration/
cd frontend && npm test
# 6. Update Story Progress
# In story file, mark tasks complete:
# - [x] **Task 1: Completed task**
# Update story status: Draft → In Progress → Review → Done
# 7. Move to Next Story
# Check Sprint Planning for next priority
# Repeat process with next story file
Alternative: Direct Development (Without BMad Agents)
# 1. Read current story specification
cat docs/stories/1.2.youtube-url-validation-parsing.md
# 2. Follow Dev Notes and architecture references
cat docs/architecture.md # Technical specifications
cat docs/front-end-spec.md # UI requirements
# 3. Implement systematically
# Follow tasks/subtasks exactly as specified
# Use provided code examples and patterns
# 4. Test and validate
pytest backend/tests/ -v
cd frontend && npm test
Story Implementation Checklist (BMad Method)
-
Review Story Requirements
- Read complete story file (
docs/stories/{epic}.{story}.{name}.md) - Study Dev Notes section with architecture references
- Understand all acceptance criteria
- Review all tasks and subtasks
- Read complete story file (
-
Follow Architecture Specifications
- Reference
docs/architecture.mdfor technical patterns - Use exact file locations specified in story
- Follow error handling patterns from architecture
- Implement according to database schema specifications
- Reference
-
Write Tests First (TDD)
- Create unit tests based on story testing requirements
- Write integration tests for API endpoints
- Add frontend component tests where specified
- Ensure test coverage meets story requirements
-
Implement Features Systematically
- Complete tasks in order specified in story
- Follow code examples and patterns from Dev Notes
- Use exact imports and dependencies specified
- Implement error handling as architecturally defined
-
Validate Implementation
- All acceptance criteria met
- All tasks/subtasks completed
- Full test suite passes
- Integration testing successful
-
Update Story Progress
- Mark tasks complete in story markdown file
- Update story status from "Draft" to "Done"
- Add completion notes to Dev Agent Record section
- Update epic progress in
docs/prd/index.md
-
Commit Changes
- Use story-based commit message format
- Reference story number in commit
- Include brief implementation summary
2. Code Standards
Python Style Guide
"""
Module docstring describing purpose and usage
"""
from typing import List, Optional, Dict, Any
import asyncio
from datetime import datetime
# Constants in UPPER_CASE
DEFAULT_TIMEOUT = 30
MAX_RETRIES = 3
class YouTubeSummarizer:
"""
Class for summarizing YouTube videos.
Attributes:
model: AI model to use for summarization
cache: Cache service instance
"""
def __init__(self, model: str = "openai"):
"""Initialize summarizer with specified model."""
self.model = model
self.cache = CacheService()
async def summarize(
self,
video_url: str,
options: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Summarize a YouTube video.
Args:
video_url: YouTube video URL
options: Optional summarization parameters
Returns:
Dictionary containing summary and metadata
Raises:
YouTubeError: If video cannot be accessed
AIServiceError: If summarization fails
"""
# Implementation here
pass
Type Hints
Always use type hints for better code quality:
from typing import Union, List, Optional, Dict, Any, Tuple
from pydantic import BaseModel, HttpUrl
async def process_video(
url: HttpUrl,
models: List[str],
max_length: Optional[int] = None
) -> Tuple[str, Dict[str, Any]]:
"""Process video with type safety."""
pass
Async/Await Pattern
Use async for all I/O operations:
async def fetch_transcript(video_id: str) -> str:
"""Fetch transcript asynchronously."""
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
# Use asyncio.gather for parallel operations
results = await asyncio.gather(
fetch_transcript(id1),
fetch_transcript(id2),
fetch_transcript(id3)
)
3. Testing Requirements
Test Structure
tests/
├── unit/
│ ├── test_youtube_service.py
│ ├── test_summarizer_service.py
│ └── test_cache_service.py
├── integration/
│ ├── test_api_endpoints.py
│ └── test_database.py
├── fixtures/
│ ├── sample_transcripts.json
│ └── mock_responses.py
└── conftest.py
Unit Test Example
# tests/unit/test_youtube_service.py
import pytest
from unittest.mock import Mock, patch, AsyncMock
from src.services.youtube import YouTubeService
class TestYouTubeService:
@pytest.fixture
def youtube_service(self):
return YouTubeService()
@pytest.fixture
def mock_transcript(self):
return [
{"text": "Hello world", "start": 0.0, "duration": 2.0},
{"text": "This is a test", "start": 2.0, "duration": 3.0}
]
@pytest.mark.asyncio
async def test_extract_transcript_success(
self,
youtube_service,
mock_transcript
):
with patch('youtube_transcript_api.YouTubeTranscriptApi.get_transcript') as mock_get:
mock_get.return_value = mock_transcript
result = await youtube_service.extract_transcript("test_id")
assert result == mock_transcript
mock_get.assert_called_once_with("test_id")
def test_extract_video_id_various_formats(self, youtube_service):
test_cases = [
("https://www.youtube.com/watch?v=abc123", "abc123"),
("https://youtu.be/xyz789", "xyz789"),
("https://youtube.com/embed/qwe456", "qwe456"),
("https://www.youtube.com/watch?v=test&t=123", "test")
]
for url, expected_id in test_cases:
assert youtube_service.extract_video_id(url) == expected_id
Integration Test Example
# tests/integration/test_api_endpoints.py
import pytest
from fastapi.testclient import TestClient
from src.main import app
@pytest.fixture
def client():
return TestClient(app)
class TestSummarizationAPI:
@pytest.mark.asyncio
async def test_summarize_endpoint(self, client):
response = client.post("/api/summarize", json={
"url": "https://youtube.com/watch?v=test123",
"model": "openai",
"options": {"max_length": 500}
})
assert response.status_code == 200
data = response.json()
assert "job_id" in data
assert data["status"] == "processing"
@pytest.mark.asyncio
async def test_get_summary(self, client):
# First create a summary
create_response = client.post("/api/summarize", json={
"url": "https://youtube.com/watch?v=test123"
})
job_id = create_response.json()["job_id"]
# Then retrieve it
get_response = client.get(f"/api/summary/{job_id}")
assert get_response.status_code in [200, 202] # 202 if still processing
Test Coverage Requirements
- Minimum 80% code coverage
- 100% coverage for critical paths
- All edge cases tested
- Error conditions covered
# Run tests with coverage
pytest tests/ --cov=src --cov-report=html --cov-report=term
# Coverage report should show:
# src/services/youtube.py 95%
# src/services/summarizer.py 88%
# src/api/routes.py 92%
4. Documentation Standards
Code Documentation
Every module, class, and function must have docstrings:
"""
Module: YouTube Transcript Extractor
This module provides functionality to extract transcripts from YouTube videos
using multiple fallback methods.
Example:
>>> extractor = TranscriptExtractor()
>>> transcript = await extractor.extract("video_id")
"""
def extract_transcript(
video_id: str,
language: str = "en",
include_auto_generated: bool = True
) -> List[Dict[str, Any]]:
"""
Extract transcript from YouTube video.
This function attempts to extract transcripts using the following priority:
1. Manual captions in specified language
2. Auto-generated captions if allowed
3. Translated captions as fallback
Args:
video_id: YouTube video identifier
language: ISO 639-1 language code (default: "en")
include_auto_generated: Whether to use auto-generated captions
Returns:
List of transcript segments with text, start time, and duration
Raises:
TranscriptNotAvailable: If no transcript can be extracted
Example:
>>> transcript = extract_transcript("dQw4w9WgXcQ", "en")
>>> print(transcript[0])
{"text": "Never gonna give you up", "start": 0.0, "duration": 3.5}
"""
pass
API Documentation
Use FastAPI's automatic documentation features:
from fastapi import APIRouter, HTTPException, status
from pydantic import BaseModel, Field
router = APIRouter()
class SummarizeRequest(BaseModel):
"""Request model for video summarization."""
url: str = Field(
...,
description="YouTube video URL",
example="https://youtube.com/watch?v=dQw4w9WgXcQ"
)
model: str = Field(
"auto",
description="AI model to use (openai, anthropic, deepseek, auto)",
example="openai"
)
max_length: Optional[int] = Field(
None,
description="Maximum summary length in words",
ge=50,
le=5000
)
@router.post(
"/summarize",
response_model=SummarizeResponse,
status_code=status.HTTP_200_OK,
summary="Summarize YouTube Video",
description="Submit a YouTube video URL for AI-powered summarization"
)
async def summarize_video(request: SummarizeRequest):
"""
Summarize a YouTube video using AI.
This endpoint accepts a YouTube URL and returns a job ID for tracking
the summarization progress. Use the /summary/{job_id} endpoint to
retrieve the completed summary.
"""
pass
5. Git Workflow
Branch Naming
# Feature branches
feature/task-2-youtube-extraction
feature/task-3-ai-summarization
# Bugfix branches
bugfix/transcript-encoding-error
bugfix/rate-limit-handling
# Hotfix branches
hotfix/critical-api-error
Commit Messages
Follow conventional commits:
# Format: <type>(<scope>): <subject>
# Examples:
feat(youtube): add transcript extraction service
fix(api): handle rate limiting correctly
docs(readme): update installation instructions
test(youtube): add edge case tests
refactor(cache): optimize cache key generation
perf(summarizer): implement parallel processing
chore(deps): update requirements.txt
Pull Request Template
## Task Reference
- Task ID: #3
- Task Title: Develop AI Summary Generation Service
## Description
Brief description of changes made
## Changes Made
- [ ] Implemented YouTube transcript extraction
- [ ] Added multi-model AI support
- [ ] Created caching layer
- [ ] Added comprehensive tests
## Testing
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing completed
- [ ] Coverage > 80%
## Documentation
- [ ] Code documented
- [ ] API docs updated
- [ ] README updated if needed
## Screenshots (if applicable)
[Add screenshots here]
6. API Design Standards
RESTful Principles
# Good API design
GET /api/summaries # List all summaries
GET /api/summaries/{id} # Get specific summary
POST /api/summaries # Create new summary
PUT /api/summaries/{id} # Update summary
DELETE /api/summaries/{id} # Delete summary
# Status codes
200 OK # Successful GET/PUT
201 Created # Successful POST
202 Accepted # Processing async request
204 No Content # Successful DELETE
400 Bad Request # Invalid input
401 Unauthorized # Missing/invalid auth
403 Forbidden # No permission
404 Not Found # Resource doesn't exist
429 Too Many Requests # Rate limited
500 Internal Error # Server error
Response Format
# Success response
{
"success": true,
"data": {
"id": "uuid",
"video_id": "abc123",
"summary": "...",
"metadata": {}
},
"timestamp": "2025-01-25T10:00:00Z"
}
# Error response
{
"success": false,
"error": {
"code": "TRANSCRIPT_NOT_AVAILABLE",
"message": "Could not extract transcript from video",
"details": "No captions available in requested language"
},
"timestamp": "2025-01-25T10:00:00Z"
}
Pagination
@router.get("/summaries")
async def list_summaries(
page: int = Query(1, ge=1),
limit: int = Query(20, ge=1, le=100),
sort: str = Query("created_at", regex="^(created_at|updated_at|title)$"),
order: str = Query("desc", regex="^(asc|desc)$")
):
"""List summaries with pagination."""
return {
"data": summaries,
"pagination": {
"page": page,
"limit": limit,
"total": total_count,
"pages": math.ceil(total_count / limit)
}
}
7. Database Operations
SQLAlchemy Models
from sqlalchemy import Column, String, Text, DateTime, Float, JSON
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import UUID
import uuid
Base = declarative_base()
class Summary(Base):
__tablename__ = "summaries"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
video_id = Column(String(20), nullable=False, index=True)
video_url = Column(Text, nullable=False)
video_title = Column(Text)
transcript = Column(Text)
summary = Column(Text)
key_points = Column(JSON)
chapters = Column(JSON)
model_used = Column(String(50))
processing_time = Column(Float)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
def to_dict(self):
"""Convert to dictionary for API responses."""
return {
"id": str(self.id),
"video_id": self.video_id,
"video_title": self.video_title,
"summary": self.summary,
"key_points": self.key_points,
"chapters": self.chapters,
"model_used": self.model_used,
"created_at": self.created_at.isoformat()
}
Database Migrations
Use Alembic for migrations:
# Create new migration
alembic revision --autogenerate -m "Add chapters column"
# Apply migrations
alembic upgrade head
# Rollback
alembic downgrade -1
Query Optimization
from sqlalchemy import select, and_
from sqlalchemy.orm import selectinload
# Efficient querying with joins
async def get_summaries_with_metadata(session, user_id: str):
stmt = (
select(Summary)
.options(selectinload(Summary.metadata))
.where(Summary.user_id == user_id)
.order_by(Summary.created_at.desc())
.limit(10)
)
result = await session.execute(stmt)
return result.scalars().all()
8. Performance Guidelines
Caching Strategy
from functools import lru_cache
import redis
import hashlib
import json
class CacheService:
def __init__(self):
self.redis = redis.Redis(decode_responses=True)
self.ttl = 3600 # 1 hour default
def get_key(self, prefix: str, **kwargs) -> str:
"""Generate cache key from parameters."""
data = json.dumps(kwargs, sort_keys=True)
hash_digest = hashlib.md5(data.encode()).hexdigest()
return f"{prefix}:{hash_digest}"
async def get_or_set(self, key: str, func, ttl: int = None):
"""Get from cache or compute and set."""
# Try cache first
cached = self.redis.get(key)
if cached:
return json.loads(cached)
# Compute result
result = await func()
# Cache result
self.redis.setex(
key,
ttl or self.ttl,
json.dumps(result)
)
return result
Async Processing
from celery import Celery
from typing import Dict, Any
celery_app = Celery('youtube_summarizer')
@celery_app.task
async def process_video_task(video_url: str, options: Dict[str, Any]):
"""Background task for video processing."""
try:
# Extract transcript
transcript = await extract_transcript(video_url)
# Generate summary
summary = await generate_summary(transcript, options)
# Save to database
await save_summary(video_url, summary)
return {"status": "completed", "summary_id": summary.id}
except Exception as e:
return {"status": "failed", "error": str(e)}
Performance Monitoring
import time
from functools import wraps
import logging
logger = logging.getLogger(__name__)
def measure_performance(func):
"""Decorator to measure function performance."""
@wraps(func)
async def wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = await func(*args, **kwargs)
elapsed = time.perf_counter() - start
logger.info(f"{func.__name__} took {elapsed:.3f}s")
return result
except Exception as e:
elapsed = time.perf_counter() - start
logger.error(f"{func.__name__} failed after {elapsed:.3f}s: {e}")
raise
return wrapper
9. Security Protocols
Input Validation
from pydantic import BaseModel, validator, HttpUrl
import re
class VideoURLValidator(BaseModel):
url: HttpUrl
@validator('url')
def validate_youtube_url(cls, v):
youtube_regex = re.compile(
r'(https?://)?(www\.)?(youtube\.com|youtu\.be)/.+'
)
if not youtube_regex.match(str(v)):
raise ValueError('Invalid YouTube URL')
return v
API Key Management
from pydantic import BaseSettings
class Settings(BaseSettings):
"""Application settings with validation."""
# API Keys (never hardcode!)
openai_api_key: str
anthropic_api_key: str
youtube_api_key: Optional[str] = None
# Security
secret_key: str
allowed_origins: List[str] = ["http://localhost:3000"]
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
case_sensitive = False
settings = Settings()
Rate Limiting
from fastapi import Request, HTTPException
from fastapi_limiter import FastAPILimiter
from fastapi_limiter.depends import RateLimiter
import redis.asyncio as redis
# Initialize rate limiter
async def init_rate_limiter():
redis_client = redis.from_url("redis://localhost:6379", encoding="utf-8", decode_responses=True)
await FastAPILimiter.init(redis_client)
# Apply rate limiting
@router.post("/summarize", dependencies=[Depends(RateLimiter(times=10, seconds=60))])
async def summarize_video(request: SummarizeRequest):
"""Rate limited to 10 requests per minute."""
pass
10. Deployment Process
Docker Configuration
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8082"]
Environment Management
# .env.development
DEBUG=true
DATABASE_URL=sqlite:///./dev.db
LOG_LEVEL=DEBUG
# .env.production
DEBUG=false
DATABASE_URL=postgresql://user:pass@db:5432/youtube_summarizer
LOG_LEVEL=INFO
Health Checks
@router.get("/health")
async def health_check():
"""Health check endpoint for monitoring."""
checks = {
"api": "healthy",
"database": await check_database(),
"cache": await check_cache(),
"ai_service": await check_ai_service()
}
all_healthy = all(v == "healthy" for v in checks.values())
return {
"status": "healthy" if all_healthy else "degraded",
"checks": checks,
"timestamp": datetime.utcnow().isoformat()
}
Monitoring
from prometheus_client import Counter, Histogram, generate_latest
# Metrics
request_count = Counter('youtube_requests_total', 'Total requests')
request_duration = Histogram('youtube_request_duration_seconds', 'Request duration')
summary_generation_time = Histogram('summary_generation_seconds', 'Summary generation time')
@router.get("/metrics")
async def metrics():
"""Prometheus metrics endpoint."""
return Response(generate_latest(), media_type="text/plain")
Agent-Specific Instructions
For AI Agents
When working on this codebase:
- Always check Task Master first:
task-master next - Follow TDD: Write tests before implementation
- Use type hints: All functions must have type annotations
- Document changes: Update docstrings and comments
- Test thoroughly: Run full test suite before marking complete
- Update task status: Keep Task Master updated with progress
Quality Checklist
Before marking any task as complete:
- All tests pass (
pytest tests/) - Code coverage > 80% (
pytest --cov=src) - No linting errors (
ruff check src/) - Type checking passes (
mypy src/) - Documentation updated
- Task Master updated
- Changes committed with proper message
Conclusion
This guide ensures consistent, high-quality development across all contributors to the YouTube Summarizer project. Follow these standards to maintain code quality, performance, and security.
Last Updated: 2025-01-25 Version: 1.0.0