479 lines
13 KiB
Markdown
479 lines
13 KiB
Markdown
# AGENTS.md - YouTube Summarizer Backend
|
|
|
|
This file provides guidance for AI agents working with the YouTube Summarizer backend implementation.
|
|
|
|
## Agent Development Context
|
|
|
|
The backend has been implemented following Story-Driven Development patterns with comprehensive testing and production-ready patterns. Agents should understand the existing architecture and extend it following established conventions.
|
|
|
|
## Current Implementation Status
|
|
|
|
### ✅ Completed Stories
|
|
- **Story 1.1**: Project Setup and Infrastructure - DONE
|
|
- **Story 2.1**: Single AI Model Integration (Anthropic) - DONE
|
|
- **Story 2.2**: Summary Generation Pipeline - DONE ⬅️ Just completed with full QA
|
|
|
|
### 🔄 Ready for Implementation
|
|
- **Story 1.2**: YouTube URL Validation and Parsing
|
|
- **Story 1.3**: Transcript Extraction Service
|
|
- **Story 1.4**: Basic Web Interface
|
|
- **Story 2.3**: Caching System Implementation
|
|
- **Story 2.4**: Multi-Model Support
|
|
- **Story 2.5**: Export Functionality
|
|
|
|
## Architecture Principles for Agents
|
|
|
|
### 1. Service Layer Pattern
|
|
All business logic lives in the `services/` directory with clear interfaces:
|
|
|
|
```python
|
|
# Follow this pattern for new services
|
|
class VideoService:
|
|
async def extract_video_id(self, url: str) -> str: ...
|
|
async def get_video_metadata(self, video_id: str) -> Dict[str, Any]: ...
|
|
async def validate_url(self, url: str) -> bool: ...
|
|
```
|
|
|
|
### 2. Dependency Injection Pattern
|
|
Use FastAPI's dependency injection for loose coupling:
|
|
|
|
```python
|
|
def get_video_service() -> VideoService:
|
|
return VideoService()
|
|
|
|
@router.post("/api/endpoint")
|
|
async def endpoint(service: VideoService = Depends(get_video_service)):
|
|
return await service.process()
|
|
```
|
|
|
|
### 3. Async-First Development
|
|
All I/O operations must be async to prevent blocking:
|
|
|
|
```python
|
|
# Correct async pattern
|
|
async def process_video(self, url: str) -> Result:
|
|
metadata = await self.video_service.get_metadata(url)
|
|
transcript = await self.transcript_service.extract(url)
|
|
summary = await self.ai_service.summarize(transcript)
|
|
return Result(metadata=metadata, summary=summary)
|
|
```
|
|
|
|
### 4. Error Handling Standards
|
|
Use custom exceptions with proper HTTP status codes:
|
|
|
|
```python
|
|
from backend.core.exceptions import ValidationError, AIServiceError
|
|
|
|
try:
|
|
result = await service.process(data)
|
|
except ValidationError as e:
|
|
raise HTTPException(status_code=400, detail=e.message)
|
|
except AIServiceError as e:
|
|
raise HTTPException(status_code=500, detail=e.message)
|
|
```
|
|
|
|
## Implementation Patterns for Agents
|
|
|
|
### Adding New API Endpoints
|
|
|
|
1. **Create the endpoint in appropriate API module**:
|
|
```python
|
|
# backend/api/videos.py
|
|
from fastapi import APIRouter, HTTPException, Depends
|
|
from ..services.video_service import VideoService
|
|
|
|
router = APIRouter(prefix="/api/videos", tags=["videos"])
|
|
|
|
@router.post("/validate")
|
|
async def validate_video_url(
|
|
request: ValidateVideoRequest,
|
|
service: VideoService = Depends(get_video_service)
|
|
):
|
|
try:
|
|
is_valid = await service.validate_url(request.url)
|
|
return {"valid": is_valid}
|
|
except ValidationError as e:
|
|
raise HTTPException(status_code=400, detail=e.message)
|
|
```
|
|
|
|
2. **Register router in main.py**:
|
|
```python
|
|
from backend.api.videos import router as videos_router
|
|
app.include_router(videos_router)
|
|
```
|
|
|
|
3. **Add comprehensive tests**:
|
|
```python
|
|
# tests/unit/test_video_service.py
|
|
@pytest.mark.asyncio
|
|
async def test_validate_url_success():
|
|
service = VideoService()
|
|
result = await service.validate_url("https://youtube.com/watch?v=abc123")
|
|
assert result is True
|
|
|
|
# tests/integration/test_videos_api.py
|
|
def test_validate_video_endpoint(client):
|
|
response = client.post("/api/videos/validate", json={"url": "https://youtube.com/watch?v=test"})
|
|
assert response.status_code == 200
|
|
assert response.json()["valid"] is True
|
|
```
|
|
|
|
### Extending the Pipeline
|
|
|
|
When adding new pipeline stages, follow the established pattern:
|
|
|
|
```python
|
|
# Add new stage to PipelineStage enum
|
|
class PipelineStage(Enum):
|
|
# ... existing stages ...
|
|
NEW_STAGE = "new_stage"
|
|
|
|
# Add stage processing to SummaryPipeline
|
|
async def _execute_pipeline(self, job_id: str, config: PipelineConfig):
|
|
# ... existing stages ...
|
|
|
|
# New stage
|
|
await self._update_progress(job_id, PipelineStage.NEW_STAGE, 85, "Processing new stage...")
|
|
new_result = await self._process_new_stage(result, config)
|
|
result.new_field = new_result
|
|
|
|
# Add progress percentage mapping
|
|
stage_percentages = {
|
|
# ... existing mappings ...
|
|
PipelineStage.NEW_STAGE: 85,
|
|
}
|
|
```
|
|
|
|
### Database Integration Pattern
|
|
|
|
When adding database models, follow the repository pattern:
|
|
|
|
```python
|
|
# backend/models/video.py
|
|
from sqlalchemy import Column, String, DateTime, Text
|
|
from .base import Base
|
|
|
|
class Video(Base):
|
|
__tablename__ = "videos"
|
|
|
|
id = Column(String, primary_key=True)
|
|
url = Column(String, nullable=False)
|
|
title = Column(String)
|
|
metadata = Column(Text) # JSON field
|
|
created_at = Column(DateTime, default=datetime.utcnow)
|
|
|
|
# backend/repositories/video_repository.py
|
|
class VideoRepository:
|
|
def __init__(self, session: AsyncSession):
|
|
self.session = session
|
|
|
|
async def create_video(self, video: Video) -> Video:
|
|
self.session.add(video)
|
|
await self.session.commit()
|
|
return video
|
|
|
|
async def get_by_id(self, video_id: str) -> Optional[Video]:
|
|
result = await self.session.execute(
|
|
select(Video).where(Video.id == video_id)
|
|
)
|
|
return result.scalar_one_or_none()
|
|
```
|
|
|
|
## Testing Guidelines for Agents
|
|
|
|
### Unit Test Structure
|
|
```python
|
|
# tests/unit/test_new_service.py
|
|
import pytest
|
|
from unittest.mock import Mock, AsyncMock
|
|
from backend.services.new_service import NewService
|
|
|
|
class TestNewService:
|
|
@pytest.fixture
|
|
def service(self):
|
|
return NewService()
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_process_success(self, service):
|
|
# Arrange
|
|
input_data = "test_input"
|
|
expected_output = "expected_result"
|
|
|
|
# Act
|
|
result = await service.process(input_data)
|
|
|
|
# Assert
|
|
assert result == expected_output
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_process_error_handling(self, service):
|
|
with pytest.raises(ServiceError):
|
|
await service.process("invalid_input")
|
|
```
|
|
|
|
### Integration Test Structure
|
|
```python
|
|
# tests/integration/test_new_api.py
|
|
from fastapi.testclient import TestClient
|
|
from unittest.mock import patch, AsyncMock
|
|
|
|
class TestNewAPI:
|
|
def test_endpoint_success(self, client):
|
|
with patch('backend.api.new.get_new_service') as mock_get_service:
|
|
mock_service = Mock()
|
|
mock_service.process = AsyncMock(return_value="result")
|
|
mock_get_service.return_value = mock_service
|
|
|
|
response = client.post("/api/new/process", json={"input": "test"})
|
|
|
|
assert response.status_code == 200
|
|
assert response.json() == {"result": "result"}
|
|
```
|
|
|
|
## Code Quality Standards
|
|
|
|
### Documentation Requirements
|
|
```python
|
|
class NewService:
|
|
"""Service for handling new functionality.
|
|
|
|
This service integrates with external APIs and provides
|
|
processed results for the application.
|
|
"""
|
|
|
|
async def process(self, input_data: str) -> Dict[str, Any]:
|
|
"""Process input data and return structured results.
|
|
|
|
Args:
|
|
input_data: Raw input string to process
|
|
|
|
Returns:
|
|
Processed results dictionary
|
|
|
|
Raises:
|
|
ValidationError: If input_data is invalid
|
|
ProcessingError: If processing fails
|
|
"""
|
|
```
|
|
|
|
### Type Hints and Validation
|
|
```python
|
|
from typing import Dict, List, Optional, Union
|
|
from pydantic import BaseModel, Field
|
|
|
|
class ProcessRequest(BaseModel):
|
|
"""Request model for processing endpoint."""
|
|
input_data: str = Field(..., description="Data to process")
|
|
options: Optional[Dict[str, Any]] = Field(None, description="Processing options")
|
|
|
|
class Config:
|
|
schema_extra = {
|
|
"example": {
|
|
"input_data": "sample input",
|
|
"options": {"format": "json"}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Error Handling Patterns
|
|
```python
|
|
from backend.core.exceptions import BaseAPIException, ErrorCode
|
|
|
|
class ProcessingError(BaseAPIException):
|
|
"""Raised when processing fails."""
|
|
def __init__(self, message: str, details: Optional[Dict] = None):
|
|
super().__init__(
|
|
message=message,
|
|
error_code=ErrorCode.PROCESSING_ERROR,
|
|
status_code=500,
|
|
details=details,
|
|
recoverable=True
|
|
)
|
|
```
|
|
|
|
## Integration with Existing Services
|
|
|
|
### Using the Pipeline Service
|
|
```python
|
|
# Get pipeline instance
|
|
pipeline = get_summary_pipeline()
|
|
|
|
# Start processing
|
|
job_id = await pipeline.process_video(
|
|
video_url="https://youtube.com/watch?v=abc123",
|
|
config=PipelineConfig(summary_length="detailed")
|
|
)
|
|
|
|
# Monitor progress
|
|
result = await pipeline.get_pipeline_result(job_id)
|
|
print(f"Status: {result.status}")
|
|
```
|
|
|
|
### Using the AI Service
|
|
```python
|
|
from backend.services.anthropic_summarizer import AnthropicSummarizer
|
|
from backend.services.ai_service import SummaryRequest, SummaryLength
|
|
|
|
ai_service = AnthropicSummarizer(api_key=api_key)
|
|
|
|
summary_result = await ai_service.generate_summary(
|
|
SummaryRequest(
|
|
transcript="Video transcript text...",
|
|
length=SummaryLength.STANDARD,
|
|
focus_areas=["key insights", "actionable items"]
|
|
)
|
|
)
|
|
|
|
print(f"Summary: {summary_result.summary}")
|
|
print(f"Key Points: {summary_result.key_points}")
|
|
```
|
|
|
|
### Using WebSocket Updates
|
|
```python
|
|
from backend.core.websocket_manager import websocket_manager
|
|
|
|
# Send progress update
|
|
await websocket_manager.send_progress_update(job_id, {
|
|
"stage": "processing",
|
|
"percentage": 50,
|
|
"message": "Halfway complete"
|
|
})
|
|
|
|
# Send completion notification
|
|
await websocket_manager.send_completion_notification(job_id, {
|
|
"status": "completed",
|
|
"result": result_data
|
|
})
|
|
```
|
|
|
|
## Performance Patterns
|
|
|
|
### Caching Integration
|
|
```python
|
|
from backend.services.cache_manager import CacheManager
|
|
|
|
cache = CacheManager()
|
|
|
|
# Cache expensive operations
|
|
cache_key = f"expensive_operation:{input_hash}"
|
|
cached_result = await cache.get_cached_result(cache_key)
|
|
|
|
if not cached_result:
|
|
result = await expensive_operation(input_data)
|
|
await cache.cache_result(cache_key, result, ttl=3600)
|
|
else:
|
|
result = cached_result
|
|
```
|
|
|
|
### Background Processing
|
|
```python
|
|
import asyncio
|
|
from fastapi import BackgroundTasks
|
|
|
|
async def long_running_task(task_id: str, data: Dict):
|
|
"""Background task for processing."""
|
|
try:
|
|
result = await process_data(data)
|
|
await store_result(task_id, result)
|
|
await notify_completion(task_id)
|
|
except Exception as e:
|
|
await store_error(task_id, str(e))
|
|
|
|
@router.post("/api/process-async")
|
|
async def start_processing(
|
|
request: ProcessRequest,
|
|
background_tasks: BackgroundTasks
|
|
):
|
|
task_id = str(uuid.uuid4())
|
|
background_tasks.add_task(long_running_task, task_id, request.dict())
|
|
return {"task_id": task_id, "status": "processing"}
|
|
```
|
|
|
|
## Security Guidelines
|
|
|
|
### Input Validation
|
|
```python
|
|
from pydantic import BaseModel, validator
|
|
import re
|
|
|
|
class VideoUrlRequest(BaseModel):
|
|
url: str
|
|
|
|
@validator('url')
|
|
def validate_youtube_url(cls, v):
|
|
youtube_pattern = r'^https?://(www\.)?(youtube\.com|youtu\.be)/.+'
|
|
if not re.match(youtube_pattern, v):
|
|
raise ValueError('Must be a valid YouTube URL')
|
|
return v
|
|
```
|
|
|
|
### API Key Management
|
|
```python
|
|
import os
|
|
from fastapi import HTTPException
|
|
|
|
def get_api_key() -> str:
|
|
api_key = os.getenv("ANTHROPIC_API_KEY")
|
|
if not api_key:
|
|
raise HTTPException(
|
|
status_code=500,
|
|
detail="API key not configured"
|
|
)
|
|
return api_key
|
|
```
|
|
|
|
## Deployment Considerations
|
|
|
|
### Environment Configuration
|
|
```python
|
|
from pydantic import BaseSettings
|
|
|
|
class Settings(BaseSettings):
|
|
anthropic_api_key: str
|
|
database_url: str = "sqlite:///./data/app.db"
|
|
redis_url: Optional[str] = None
|
|
log_level: str = "INFO"
|
|
cors_origins: List[str] = ["http://localhost:3000"]
|
|
|
|
class Config:
|
|
env_file = ".env"
|
|
|
|
settings = Settings()
|
|
```
|
|
|
|
### Health Checks
|
|
```python
|
|
@router.get("/health")
|
|
async def health_check():
|
|
"""Health check endpoint for load balancers."""
|
|
checks = {
|
|
"database": await check_database_connection(),
|
|
"cache": await check_cache_connection(),
|
|
"ai_service": await check_ai_service(),
|
|
}
|
|
|
|
all_healthy = all(checks.values())
|
|
status_code = 200 if all_healthy else 503
|
|
|
|
return {"status": "healthy" if all_healthy else "unhealthy", "checks": checks}
|
|
```
|
|
|
|
## Migration Patterns
|
|
|
|
When extending existing functionality, maintain backward compatibility:
|
|
|
|
```python
|
|
# Version 1 API
|
|
@router.post("/api/summarize")
|
|
async def summarize_v1(request: SummarizeRequest):
|
|
# Legacy implementation
|
|
pass
|
|
|
|
# Version 2 API (new functionality)
|
|
@router.post("/api/v2/summarize")
|
|
async def summarize_v2(request: SummarizeRequestV2):
|
|
# Enhanced implementation
|
|
pass
|
|
```
|
|
|
|
This backend follows production-ready patterns and is designed for extensibility. Agents should maintain these standards when adding new functionality. |