youtube-summarizer/docs/stories/1.2.youtube-url-validation-...

387 lines
13 KiB
Markdown

# Story 1.2: YouTube URL Validation and Parsing
## Status
Ready for Review
## Story
**As a** user
**I want** the system to accept any valid YouTube URL format
**so that** I can paste URLs directly from my browser without modification
## Acceptance Criteria
1. System correctly parses video IDs from youtube.com/watch?v=, youtu.be/, and embed URL formats
2. Invalid URLs return clear error messages specifying the expected format
3. System extracts and validates video IDs are exactly 11 characters
4. Playlist URLs are detected and user is informed they're not yet supported
5. URL validation happens client-side for instant feedback and server-side for security
## Tasks / Subtasks
- [x] **Task 1: Backend URL Validation Service** (AC: 1, 2, 3)
- [x] Create `VideoService.extract_video_id()` method in `backend/services/video_service.py`
- [x] Implement regex patterns for all YouTube URL formats
- [x] Add video ID validation (exactly 11 characters, valid character set)
- [x] Create custom exceptions for URL validation errors
- [x] **Task 2: Frontend URL Validation** (AC: 5)
- [x] Create URL validation hook `useURLValidation` in `frontend/src/hooks/`
- [x] Implement client-side regex validation for instant feedback
- [x] Add validation state management (valid, invalid, pending)
- [x] Create error message components with format examples
- [x] **Task 3: API Endpoint for URL Validation** (AC: 2, 5)
- [x] Create `/api/validate-url` POST endpoint in `backend/api/`
- [x] Implement request/response models with Pydantic
- [x] Add comprehensive error responses with recovery suggestions
- [x] Include supported URL format examples in error responses
- [x] **Task 4: Playlist URL Detection** (AC: 4)
- [x] Add playlist URL pattern recognition to validation service
- [x] Create informative error message for playlist URLs
- [x] Suggest alternative approach for playlist processing
- [x] Log playlist URL attempts for future feature consideration
- [x] **Task 5: URL Validation UI Components** (AC: 5)
- [x] Update `SummarizeForm` component with real-time validation
- [x] Add visual validation indicators (checkmark, error icon)
- [x] Create validation error display with format examples
- [x] Implement debounced validation to avoid excessive API calls
- [x] **Task 6: Integration Testing** (AC: 1, 2, 3, 4, 5)
- [x] Create comprehensive URL test cases covering all formats
- [x] Test edge cases: malformed URLs, wrong domains, invalid characters
- [x] Test integration between frontend and backend validation
- [x] Verify error messages are helpful and actionable
## Dev Notes
### Architecture Context
This story implements the URL validation layer that serves as the entry point for all YouTube video processing. It establishes the foundation for secure and reliable video ID extraction that will be used throughout the application.
### Video Service Implementation Requirements
[Source: docs/architecture.md#backend-services]
```python
class VideoService:
def extract_video_id(self, url: str) -> str:
"""Extract YouTube video ID with comprehensive validation"""
patterns = [
r'(?:https?://)?(?:www\.)?youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})',
r'(?:https?://)?(?:www\.)?youtu\.be/([a-zA-Z0-9_-]{11})',
r'(?:https?://)?(?:www\.)?youtube\.com/embed/([a-zA-Z0-9_-]{11})'
]
for pattern in patterns:
match = re.search(pattern, url)
if match:
return match.group(1)
raise UserInputError(
message="Invalid YouTube URL format",
error_code=ErrorCode.INVALID_URL,
details={
"url": url,
"supported_formats": [
"https://youtube.com/watch?v=VIDEO_ID",
"https://youtu.be/VIDEO_ID",
"https://youtube.com/embed/VIDEO_ID"
]
}
)
```
### Error Handling Requirements
[Source: docs/architecture.md#error-handling]
**Custom Exception Classes**:
```python
class UserInputError(BaseAPIException):
"""Errors caused by invalid user input"""
def __init__(self, message: str, error_code: ErrorCode, details: Optional[Dict] = None):
super().__init__(
message=message,
error_code=error_code,
status_code=status.HTTP_400_BAD_REQUEST,
details=details,
recoverable=True
)
```
**Error Codes for URL Validation**:
- `INVALID_URL`: Invalid YouTube URL format
- `UNSUPPORTED_FORMAT`: Valid URL but unsupported type (e.g., playlist)
### Frontend Implementation Requirements
[Source: docs/architecture.md#frontend-architecture]
**URL Validation Hook**:
```typescript
interface URLValidationState {
isValid: boolean;
isValidating: boolean;
error?: {
code: string;
message: string;
supportedFormats: string[];
};
}
export function useURLValidation() {
const validateURL = useCallback(async (url: string): Promise<URLValidationState> => {
// Client-side validation first
if (!url.trim()) return { isValid: false, isValidating: false };
// Basic format check
const patterns = [
/youtube\.com\/watch\?v=[\w-]+/,
/youtu\.be\/[\w-]+/,
/youtube\.com\/embed\/[\w-]+/
];
const hasValidPattern = patterns.some(pattern => pattern.test(url));
if (!hasValidPattern) {
return {
isValid: false,
isValidating: false,
error: {
code: 'INVALID_URL',
message: 'Invalid YouTube URL format',
supportedFormats: [
'https://youtube.com/watch?v=VIDEO_ID',
'https://youtu.be/VIDEO_ID',
'https://youtube.com/embed/VIDEO_ID'
]
}
};
}
// Server-side validation for security
return apiClient.validateURL(url);
}, []);
return { validateURL };
}
```
### API Endpoint Specification
[Source: docs/architecture.md#api-specification]
**Request/Response Models**:
```python
class URLValidationRequest(BaseModel):
url: str = Field(..., description="YouTube URL to validate")
class URLValidationResponse(BaseModel):
is_valid: bool
video_id: Optional[str] = None
video_url: Optional[str] = None # Normalized URL
error: Optional[Dict[str, Any]] = None
```
**Endpoint Implementation**:
```python
@router.post("/validate-url", response_model=URLValidationResponse)
async def validate_url(request: URLValidationRequest, video_service: VideoService = Depends()):
try:
video_id = video_service.extract_video_id(request.url)
normalized_url = f"https://youtube.com/watch?v={video_id}"
return URLValidationResponse(
is_valid=True,
video_id=video_id,
video_url=normalized_url
)
except UserInputError as e:
return URLValidationResponse(
is_valid=False,
error={
"code": e.error_code,
"message": e.message,
"details": e.details
}
)
```
### File Locations and Structure
[Source: docs/architecture.md#project-structure]
**Backend Files**:
- `backend/services/video_service.py` - Main validation logic
- `backend/api/validation.py` - URL validation endpoint
- `backend/core/exceptions.py` - Custom exception classes
- `backend/tests/unit/test_video_service.py` - Unit tests for URL parsing
**Frontend Files**:
- `frontend/src/hooks/useURLValidation.ts` - URL validation hook
- `frontend/src/components/forms/SummarizeForm.tsx` - Updated form component
- `frontend/src/components/ui/ValidationFeedback.tsx` - Validation UI component
- `frontend/src/test/hooks/useURLValidation.test.ts` - Hook testing
### Supported URL Formats
Based on YouTube's URL structure patterns:
1. **Standard Watch URL**: `https://youtube.com/watch?v=dQw4w9WgXcQ`
2. **Short URL**: `https://youtu.be/dQw4w9WgXcQ`
3. **Embed URL**: `https://youtube.com/embed/dQw4w9WgXcQ`
4. **Mobile URL**: `https://m.youtube.com/watch?v=dQw4w9WgXcQ`
5. **With Additional Parameters**: `https://youtube.com/watch?v=dQw4w9WgXcQ&t=30s`
### Unsupported Formats (Future Features)
- Playlist URLs: `https://youtube.com/playlist?list=PLxxxxx`
- Channel URLs: `https://youtube.com/@channelname`
- Search URLs: `https://youtube.com/results?search_query=term`
### Testing Standards
#### Backend Unit Tests
[Source: docs/architecture.md#testing-strategy]
**Test File**: `backend/tests/unit/test_video_service.py`
```python
class TestVideoService:
def test_extract_video_id_success(self):
"""Test successful video ID extraction from various URL formats"""
service = VideoService()
test_cases = [
("https://youtube.com/watch?v=dQw4w9WgXcQ", "dQw4w9WgXcQ"),
("https://youtu.be/dQw4w9WgXcQ", "dQw4w9WgXcQ"),
("https://youtube.com/embed/dQw4w9WgXcQ", "dQw4w9WgXcQ"),
("youtube.com/watch?v=dQw4w9WgXcQ", "dQw4w9WgXcQ"),
]
for url, expected_id in test_cases:
result = service.extract_video_id(url)
assert result == expected_id
def test_extract_video_id_invalid_url(self):
"""Test video ID extraction with invalid URLs"""
service = VideoService()
invalid_urls = [
"https://vimeo.com/123456789",
"https://youtube.com/invalid",
"not-a-url-at-all",
"https://youtube.com/watch?v=short" # Too short ID
]
for url in invalid_urls:
with pytest.raises(UserInputError) as exc_info:
service.extract_video_id(url)
assert exc_info.value.error_code == ErrorCode.INVALID_URL
```
#### Frontend Component Tests
[Source: docs/architecture.md#testing-strategy]
**Test File**: `frontend/src/components/forms/SummarizeForm.test.tsx`
```typescript
describe('SummarizeForm URL Validation', () => {
it('shows validation error for invalid URL', async () => {
render(<SummarizeForm />, { wrapper: createWrapper() });
const input = screen.getByPlaceholderText(/paste youtube url/i);
fireEvent.change(input, { target: { value: 'invalid-url' } });
await waitFor(() => {
expect(screen.getByText(/invalid youtube url/i)).toBeInTheDocument();
expect(screen.getByText(/supported formats/i)).toBeInTheDocument();
});
});
it('accepts valid YouTube URLs', async () => {
const validUrls = [
'https://youtube.com/watch?v=dQw4w9WgXcQ',
'https://youtu.be/dQw4w9WgXcQ',
'https://youtube.com/embed/dQw4w9WgXcQ'
];
for (const url of validUrls) {
render(<SummarizeForm />, { wrapper: createWrapper() });
const input = screen.getByPlaceholderText(/paste youtube url/i);
fireEvent.change(input, { target: { value: url } });
await waitFor(() => {
expect(screen.queryByText(/invalid youtube url/i)).not.toBeInTheDocument();
});
}
});
});
```
### Security Considerations
- **Input Sanitization**: All URLs sanitized before processing
- **XSS Prevention**: HTML escaping for user-provided URLs
- **Rate Limiting**: Validation endpoint included in rate limiting
- **Client-Side Validation**: For UX only, never trust client-side validation for security
## Change Log
| Date | Version | Description | Author |
|------|---------|-------------|--------|
| 2025-01-25 | 1.0 | Initial story creation | Bob (Scrum Master) |
| 2025-01-25 | 1.1 | Backend implementation complete | James (Developer) |
## Dev Agent Record
### Agent Model Used
Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
### Debug Log References
- Task 1: Backend URL validation service implemented with regex patterns for all YouTube formats
- Task 3: API endpoint created with comprehensive error handling
- Task 4: Playlist detection integrated into VideoService
- Task 6: Integration tests created for API validation
### Completion Notes List
- ✅ VideoService with comprehensive URL validation created
- ✅ Support for standard, short, embed, and mobile YouTube URLs
- ✅ Playlist URL detection with helpful error messages
- ✅ FastAPI endpoint with Pydantic models for validation
- ✅ Custom exception hierarchy for error handling
- ✅ Unit tests for VideoService (14 test cases)
- ✅ Integration tests for API endpoints (11 test cases)
- ✅ React hooks for URL validation with debouncing
- ✅ UI components with real-time validation feedback
- ✅ Frontend tests for hooks and components
### File List
**Backend Files Created:**
- backend/__init__.py
- backend/services/__init__.py
- backend/services/video_service.py
- backend/api/__init__.py
- backend/api/validation.py
- backend/core/__init__.py
- backend/core/exceptions.py
- backend/models/__init__.py
- backend/models/validation.py
- backend/main.py
- backend/tests/__init__.py
- backend/tests/unit/__init__.py
- backend/tests/unit/test_video_service.py
- backend/tests/integration/__init__.py
- backend/tests/integration/test_validation_api.py
- backend/requirements.txt
**Frontend Files Created:**
- frontend/package.json
- frontend/tsconfig.json
- frontend/src/types/validation.ts
- frontend/src/api/client.ts
- frontend/src/hooks/useURLValidation.ts
- frontend/src/hooks/useURLValidation.test.ts
- frontend/src/components/ui/ValidationFeedback.tsx
- frontend/src/components/forms/SummarizeForm.tsx
- frontend/src/components/forms/SummarizeForm.test.tsx
**Modified:**
- docs/stories/1.2.youtube-url-validation-parsing.md
## QA Results
*Results from QA Agent review of the completed story implementation will be added here*