16 KiB
Story 4.1: Dual Transcript Options - Unified Development Checklist
Overview
This checklist consolidates both the story document and implementation plan into a single, actionable development workflow for implementing dual transcript options (YouTube + Whisper) in the YouTube Summarizer project.
Story: 4.1 - Dual Transcript Options (YouTube + Whisper)
Estimated Effort: 22 hours
Priority: High 🔥
Target Completion: 3-4 development days
Phase 1: Backend Foundation (8 hours)
Task 1.1: Copy and Adapt TranscriptionService (3 hours)
-
Copy source code (30 min) - COMPLETED in story analysis
cp archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py \ apps/youtube-summarizer/backend/services/whisper_transcript_service.py -
Remove podcast dependencies (45 min)
- Remove
PodcastEpisode,PodcastTranscript,Repositoryimports - Remove repository dependency injection
- Simplify constructor to not require database repository
- Update logging to remove podcast-specific context
- Remove
-
Adapt for YouTube context (60 min)
- Update
transcribe_episode()→transcribe_audio_file() - Modify segment storage to return data instead of database writes
- Update error handling for YouTube-specific scenarios
- Add YouTube video ID context to logging
- Update
-
Add async compatibility (45 min)
- Wrap synchronous Whisper calls in
asyncio.run_in_executor() - Update method signatures to async/await pattern
- Test async integration with existing services
- Verify thread safety for concurrent requests
- Wrap synchronous Whisper calls in
Deliverable: Working WhisperTranscriptService class ready for integration
Task 1.2: Update Dependencies and Environment (2 hours)
-
Update requirements.txt (15 min)
# Add to backend/requirements.txt openai-whisper==20231117 torch>=2.0.0 librosa==0.10.1 pydub==0.25.1 soundfile==0.12.1 -
Update Docker configuration (45 min)
- Add ffmpeg system dependency to Dockerfile
- Add Whisper dependencies to Docker build
- Test Docker build with new dependencies
- Optimize Docker layer caching for dependencies
-
Test Whisper model download (30 min)
- Test "small" model download (~244MB)
- Verify CUDA detection works (if available)
- Add model caching directory configuration
- Test model loading performance
-
Environment configuration (30 min)
# Add to .env WHISPER_MODEL_SIZE=small WHISPER_DEVICE=auto WHISPER_MODEL_CACHE=/tmp/whisper_models- Update environment variable documentation
- Add configuration validation on startup
Deliverable: Environment ready for Whisper integration with proper dependency management
Task 1.3: Replace MockWhisperService (3 hours)
-
Update EnhancedTranscriptService (90 min)
- Replace MockWhisperService with real WhisperTranscriptService
- Update constructor to instantiate real Whisper service
- Remove all mock-related code and comments
- Update service method calls to use real Whisper API
-
Update dependency injection (30 min)
- Modify
main.pyservice initialization - Update FastAPI dependency functions
- Ensure proper service lifecycle management
- Add error handling for service initialization
- Modify
-
Test integration (60 min)
- Unit test with sample audio file
- Integration test with VideoDownloadService
- Verify transcript quality and timing accuracy
- Test error handling scenarios
Deliverable: Working Whisper integration in existing EnhancedTranscriptService architecture
Phase 2: API Enhancement (4 hours)
Task 2.1: Create DualTranscriptService (2 hours)
-
Create DualTranscriptService class (60 min)
- Extend EnhancedTranscriptService
- Implement
extract_dual_transcripts()with parallel processing - Add proper error handling for both transcript sources
- Implement timeout and cancellation support
-
Implement quality comparison (45 min)
- Word-by-word accuracy comparison algorithm
- Confidence score calculation
- Timing precision analysis
- Generate quality metrics and difference highlights
-
Add caching for dual results (15 min)
- Cache YouTube and Whisper results separately
- Extended TTL for Whisper (more expensive to regenerate)
- Implement cache key strategy for dual transcripts
Deliverable: DualTranscriptService with parallel processing and quality comparison
Task 2.2: Add New API Endpoints (2 hours)
-
Create transcript selection models (30 min)
class TranscriptOptionsRequest(BaseModel): source: Literal['youtube', 'whisper', 'both'] = 'youtube' whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small' language: str = 'en' include_timestamps: bool = True -
Add dual transcript endpoint (60 min)
- Create
/api/transcripts/dual/{video_id}endpoint - Handle all three source options (youtube, whisper, both)
- Add proper request validation and error responses
- Implement authentication and authorization
- Create
-
Update existing pipeline (30 min)
- Modify SummaryPipeline to accept transcript source preference
- Update processing status to show transcript method
- Add transcript quality metrics to summary result
- Update WebSocket notifications for transcript selection
Deliverable: Complete API interface for transcript source selection
Phase 3: Database Schema Updates (2 hours)
Task 3.1: Extend Summary Model (1 hour)
-
Create database migration (30 min)
ALTER TABLE summaries ADD COLUMN transcript_source VARCHAR(20), ADD COLUMN transcript_quality_score FLOAT, ADD COLUMN youtube_transcript TEXT, ADD COLUMN whisper_transcript TEXT, ADD COLUMN whisper_processing_time FLOAT, ADD COLUMN transcript_comparison_data JSON; -
Update Summary model (20 min)
- Add new fields to SQLAlchemy model
- Update model relationships and constraints
- Add field validation and defaults
-
Update repository methods (10 min)
- Add methods for storing dual transcript data
- Add queries for transcript source filtering
- Update existing queries to include new fields
Deliverable: Database schema supporting dual transcript metadata
Task 3.2: Add Performance Indexes (1 hour)
-
Create performance indexes (30 min)
CREATE INDEX idx_summaries_transcript_source ON summaries(transcript_source); CREATE INDEX idx_summaries_quality_score ON summaries(transcript_quality_score); CREATE INDEX idx_summaries_processing_time ON summaries(whisper_processing_time); -
Test query performance (20 min)
- Verify index usage with EXPLAIN queries
- Test filtering by transcript source
- Benchmark query times with sample data
-
Run migration and validate (10 min)
- Apply migration to development database
- Verify all fields accessible and properly typed
- Test with sample data insertion and retrieval
Deliverable: Optimized database schema with proper indexing
Phase 4: Frontend Implementation (6 hours)
Task 4.1: Create TranscriptSelector Component (2 hours)
-
Create base component (45 min)
interface TranscriptSelectorProps { value: TranscriptSource onChange: (source: TranscriptSource) => void estimatedDuration?: number disabled?: boolean } -
Add processing time estimation (30 min)
- Calculate Whisper processing time based on video duration
- Show cost/time comparison for each option
- Display clear indicators (Fast/Free vs Accurate/Slower)
-
Style and accessibility (45 min)
- Implement with Radix UI RadioGroup
- Add proper ARIA labels and descriptions
- Visual icons and quality indicators (📺 🎯 🔄)
- Responsive design for mobile/desktop
Deliverable: TranscriptSelector component with full UI/UX implementation
Task 4.2: Add to SummarizeForm (1 hour)
-
Update SummarizeForm component (30 min)
- Add transcript source state management
- Integrate TranscriptSelector into form layout
- Update form submission to include transcript options
-
Update form validation (15 min)
- Add transcript options to form schema validation
- Validate transcript source selection
- Handle validation errors appropriately
-
Test integration (15 min)
- Verify form works with new component
- Test all transcript source options
- Ensure admin page compatibility (no auth required)
Deliverable: Updated SummarizeForm with transcript selection integration
Task 4.3: Create TranscriptComparison Component (2 hours)
-
Create comparison UI (75 min)
interface TranscriptComparisonProps { youtubeTranscript: TranscriptResult whisperTranscript: TranscriptResult onSelectTranscript: (source: TranscriptSource) => void } -
Implement difference highlighting (30 min)
- Word-level diff algorithm
- Visual indicators for additions/changes/deletions
- Quality metric displays and comparison badges
-
Add selection controls (15 min)
- Buttons to choose which transcript to use for summary
- Quality score badges and processing time display
- Clear visual feedback for selection
Deliverable: TranscriptComparison component with side-by-side display
Task 4.4: Update Processing UI (1 hour)
-
Update ProgressTracker (30 min)
- Add transcript source indicator to progress display
- Show different messages for Whisper vs YouTube processing
- Add estimated time remaining for Whisper transcription
-
Update result display (20 min)
- Show which transcript source was used in final result
- Display quality metrics and confidence scores
- Add transcript comparison link if both available
-
Error handling (10 min)
- Handle Whisper processing failures gracefully
- Show fallback notifications clearly
- Provide retry options for failed transcriptions
Deliverable: Updated processing UI with dual transcript awareness
Phase 5: Testing and Integration (2 hours)
Task 5.1: Unit Tests (1 hour)
-
Backend unit tests (30 min)
# backend/tests/unit/test_whisper_transcript_service.py def test_whisper_transcription_accuracy() def test_dual_transcript_comparison() def test_automatic_fallback() def test_processing_time_estimation() -
Frontend unit tests (20 min)
# frontend/src/components/__tests__/TranscriptSelector.test.tsx describe('TranscriptSelector', () => { test('shows processing time estimates') test('handles source selection') test('displays quality indicators') }) -
API endpoint tests (10 min)
- Test dual transcript endpoint with all source options
- Test transcript option validation
- Test error handling scenarios
Deliverable: >80% test coverage for new dual transcript functionality
Task 5.2: Integration Testing (1 hour)
-
YouTube vs Whisper comparison test (20 min)
- Process same video with both methods
- Verify quality differences are meaningful
- Confirm timing accuracy and word differences
-
Admin page testing (15 min)
- Test transcript selector in admin interface
- Verify no authentication required for admin access
- Test all transcript source options work without login
-
Error scenario testing (15 min)
- Test unavailable YouTube captions (automatic fallback to Whisper)
- Test Whisper processing failure (graceful error handling)
- Test long video processing (chunking and timeouts)
-
Performance testing (10 min)
- Benchmark Whisper processing times for different video lengths
- Test parallel processing performance
- Verify cache effectiveness and hit rates
Deliverable: All integration scenarios passing with documented test results
Acceptance Criteria Validation
AC 1: Transcript Source Selection UI ✅
- Three clear choices visible: YouTube Captions, AI Whisper, Compare Both
- Processing time estimates shown for each option
- Quality level indicators clearly displayed
- Icons and badges provide clear visual differentiation
AC 2: YouTube Transcript Processing (Default) ✅
- YouTube Captions option processes in under 5 seconds
- Transcript source marked as "youtube" in database
- Quality score calculated based on caption availability
- Existing functionality maintained for backward compatibility
AC 3: Whisper Transcript Processing ✅
- AI Whisper option uses integrated TranscriptionService
- Audio downloaded using existing VideoDownloadService
- High-quality transcript returned with timestamps
- Transcript source marked as "whisper" in database
- Processing time communicated clearly to user
AC 4: Dual Transcript Comparison ✅
- Side-by-side transcript comparison interface
- Differences highlighted (word accuracy, punctuation, technical terms)
- Quality metrics shown for each transcript
- User can switch between transcripts for summary generation
AC 5: Automatic Fallback ✅
- System automatically falls back to Whisper when YouTube captions unavailable
- User notified of fallback with processing time estimate
- Final result shows "whisper" as source method
- No manual intervention required
AC 6: Quality and Cost Transparency ✅
- Clear processing time estimates (YouTube: 2-5s, Whisper: 30-120s)
- Quality indicators (YouTube: "Standard", Whisper: "High Accuracy")
- Availability status clearly communicated
- Cost implications transparently displayed
Success Metrics Tracking
- User Understanding: 80%+ users understand transcript option differences
- Feature Adoption: 30%+ of users try Whisper transcription option
- Quality Improvement: 25%+ improvement in transcript accuracy using Whisper
- User Satisfaction: <5% user complaints about transcript quality
- Reliability: Zero failed transcriptions due to unavailable YouTube captions
Risk Mitigation Checklist
High Risk Items
- Processing Time Management: Clear time estimates and progress indicators implemented
- Resource Consumption: Processing queue and throttling mechanisms in place
- Model Download: Pre-downloaded models in Docker image or graceful download handling
- Audio Quality: Preprocessing and quality checks implemented
Quality Assurance
- All error scenarios tested and handled gracefully
- Performance benchmarks met (Whisper <2 minutes for 10-minute video)
- Memory usage stays within acceptable limits (<2GB peak)
- Cache effectiveness verified and optimized
Definition of Done
Story 4.1 is complete when:
- All tasks and subtasks marked complete above
- All acceptance criteria validated ✅
- Unit tests passing with >80% coverage
- Integration tests passing for all scenarios
- Performance benchmarks met
- Documentation updated
- Code review completed
- Admin page supports dual transcript options
- Production deployment checklist complete
Post-Implementation Tasks
Monitoring Setup
- Add metrics for transcript source usage patterns
- Monitor Whisper processing times and success rates
- Track user satisfaction with transcript quality
- Log resource usage patterns for optimization
Documentation Updates
- Update API documentation with new endpoints
- Add user guide for transcript options
- Document deployment requirements (FFmpeg, model caching)
- Update troubleshooting guide
Implementation Owner: Development Team
Reviewers: Technical Lead, Product Owner
Epic: Epic 4 - Advanced Intelligence & Developer Platform
Status: Ready for Implementation
Last Updated: 2025-08-27
This unified checklist provides a comprehensive roadmap combining both the story requirements and detailed implementation plan into a single, actionable development workflow.