16 KiB

Raw Permalink Blame History

Story 4.1: Dual Transcript Options - Unified Development Checklist

Overview

This checklist consolidates both the story document and implementation plan into a single, actionable development workflow for implementing dual transcript options (YouTube + Whisper) in the YouTube Summarizer project.

Story: 4.1 - Dual Transcript Options (YouTube + Whisper)
Estimated Effort: 22 hours
Priority: High 🔥
Target Completion: 3-4 development days

Phase 1: Backend Foundation (8 hours)

Task 1.1: Copy and Adapt TranscriptionService (3 hours)

Copy source code (30 min) - COMPLETED in story analysis

cp archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py \
   apps/youtube-summarizer/backend/services/whisper_transcript_service.py

Remove podcast dependencies (45 min)
- Remove PodcastEpisode, PodcastTranscript, Repository imports
- Remove repository dependency injection
- Simplify constructor to not require database repository
- Update logging to remove podcast-specific context
Adapt for YouTube context (60 min)
- Update transcribe_episode() → transcribe_audio_file()
- Modify segment storage to return data instead of database writes
- Update error handling for YouTube-specific scenarios
- Add YouTube video ID context to logging
Add async compatibility (45 min)
- Wrap synchronous Whisper calls in asyncio.run_in_executor()
- Update method signatures to async/await pattern
- Test async integration with existing services
- Verify thread safety for concurrent requests

Deliverable: Working WhisperTranscriptService class ready for integration

Task 1.2: Update Dependencies and Environment (2 hours)

Update requirements.txt (15 min)

# Add to backend/requirements.txt
openai-whisper==20231117
torch>=2.0.0
librosa==0.10.1
pydub==0.25.1
soundfile==0.12.1

Update Docker configuration (45 min)
- Add ffmpeg system dependency to Dockerfile
- Add Whisper dependencies to Docker build
- Test Docker build with new dependencies
- Optimize Docker layer caching for dependencies
Test Whisper model download (30 min)
- Test "small" model download (~244MB)
- Verify CUDA detection works (if available)
- Add model caching directory configuration
- Test model loading performance
Environment configuration (30 min)
```
# Add to .env
WHISPER_MODEL_SIZE=small
WHISPER_DEVICE=auto
WHISPER_MODEL_CACHE=/tmp/whisper_models
```
- Update environment variable documentation
- Add configuration validation on startup

Deliverable: Environment ready for Whisper integration with proper dependency management

Task 1.3: Replace MockWhisperService (3 hours)

Update EnhancedTranscriptService (90 min)
- Replace MockWhisperService with real WhisperTranscriptService
- Update constructor to instantiate real Whisper service
- Remove all mock-related code and comments
- Update service method calls to use real Whisper API
Update dependency injection (30 min)
- Modify main.py service initialization
- Update FastAPI dependency functions
- Ensure proper service lifecycle management
- Add error handling for service initialization
Test integration (60 min)
- Unit test with sample audio file
- Integration test with VideoDownloadService
- Verify transcript quality and timing accuracy
- Test error handling scenarios

Deliverable: Working Whisper integration in existing EnhancedTranscriptService architecture

Phase 2: API Enhancement (4 hours)

Task 2.1: Create DualTranscriptService (2 hours)

Create DualTranscriptService class (60 min)
- Extend EnhancedTranscriptService
- Implement extract_dual_transcripts() with parallel processing
- Add proper error handling for both transcript sources
- Implement timeout and cancellation support
Implement quality comparison (45 min)
- Word-by-word accuracy comparison algorithm
- Confidence score calculation
- Timing precision analysis
- Generate quality metrics and difference highlights
Add caching for dual results (15 min)
- Cache YouTube and Whisper results separately
- Extended TTL for Whisper (more expensive to regenerate)
- Implement cache key strategy for dual transcripts

Deliverable: DualTranscriptService with parallel processing and quality comparison

Task 2.2: Add New API Endpoints (2 hours)

Create transcript selection models (30 min)

class TranscriptOptionsRequest(BaseModel):
    source: Literal['youtube', 'whisper', 'both'] = 'youtube'
    whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small'
    language: str = 'en'
    include_timestamps: bool = True

Add dual transcript endpoint (60 min)
- Create /api/transcripts/dual/{video_id} endpoint
- Handle all three source options (youtube, whisper, both)
- Add proper request validation and error responses
- Implement authentication and authorization
Update existing pipeline (30 min)
- Modify SummaryPipeline to accept transcript source preference
- Update processing status to show transcript method
- Add transcript quality metrics to summary result
- Update WebSocket notifications for transcript selection

Deliverable: Complete API interface for transcript source selection

Phase 3: Database Schema Updates (2 hours)

Task 3.1: Extend Summary Model (1 hour)

Create database migration (30 min)

ALTER TABLE summaries 
ADD COLUMN transcript_source VARCHAR(20),
ADD COLUMN transcript_quality_score FLOAT,
ADD COLUMN youtube_transcript TEXT,
ADD COLUMN whisper_transcript TEXT,
ADD COLUMN whisper_processing_time FLOAT,
ADD COLUMN transcript_comparison_data JSON;

Update Summary model (20 min)
- Add new fields to SQLAlchemy model
- Update model relationships and constraints
- Add field validation and defaults
Update repository methods (10 min)
- Add methods for storing dual transcript data
- Add queries for transcript source filtering
- Update existing queries to include new fields

Deliverable: Database schema supporting dual transcript metadata

Task 3.2: Add Performance Indexes (1 hour)

Create performance indexes (30 min)

CREATE INDEX idx_summaries_transcript_source ON summaries(transcript_source);
CREATE INDEX idx_summaries_quality_score ON summaries(transcript_quality_score);
CREATE INDEX idx_summaries_processing_time ON summaries(whisper_processing_time);

Test query performance (20 min)
- Verify index usage with EXPLAIN queries
- Test filtering by transcript source
- Benchmark query times with sample data
Run migration and validate (10 min)
- Apply migration to development database
- Verify all fields accessible and properly typed
- Test with sample data insertion and retrieval

Deliverable: Optimized database schema with proper indexing

Phase 4: Frontend Implementation (6 hours)

Task 4.1: Create TranscriptSelector Component (2 hours)

Create base component (45 min)

interface TranscriptSelectorProps {
  value: TranscriptSource
  onChange: (source: TranscriptSource) => void
  estimatedDuration?: number
  disabled?: boolean
}

Add processing time estimation (30 min)
- Calculate Whisper processing time based on video duration
- Show cost/time comparison for each option
- Display clear indicators (Fast/Free vs Accurate/Slower)
Style and accessibility (45 min)
- Implement with Radix UI RadioGroup
- Add proper ARIA labels and descriptions
- Visual icons and quality indicators (📺 🎯 🔄)
- Responsive design for mobile/desktop

Deliverable: TranscriptSelector component with full UI/UX implementation

Task 4.2: Add to SummarizeForm (1 hour)

Update SummarizeForm component (30 min)
- Add transcript source state management
- Integrate TranscriptSelector into form layout
- Update form submission to include transcript options
Update form validation (15 min)
- Add transcript options to form schema validation
- Validate transcript source selection
- Handle validation errors appropriately
Test integration (15 min)
- Verify form works with new component
- Test all transcript source options
- Ensure admin page compatibility (no auth required)

Deliverable: Updated SummarizeForm with transcript selection integration

Task 4.3: Create TranscriptComparison Component (2 hours)

Create comparison UI (75 min)

interface TranscriptComparisonProps {
  youtubeTranscript: TranscriptResult
  whisperTranscript: TranscriptResult
  onSelectTranscript: (source: TranscriptSource) => void
}

Implement difference highlighting (30 min)
- Word-level diff algorithm
- Visual indicators for additions/changes/deletions
- Quality metric displays and comparison badges
Add selection controls (15 min)
- Buttons to choose which transcript to use for summary
- Quality score badges and processing time display
- Clear visual feedback for selection

Deliverable: TranscriptComparison component with side-by-side display

Task 4.4: Update Processing UI (1 hour)

Update ProgressTracker (30 min)
- Add transcript source indicator to progress display
- Show different messages for Whisper vs YouTube processing
- Add estimated time remaining for Whisper transcription
Update result display (20 min)
- Show which transcript source was used in final result
- Display quality metrics and confidence scores
- Add transcript comparison link if both available
Error handling (10 min)
- Handle Whisper processing failures gracefully
- Show fallback notifications clearly
- Provide retry options for failed transcriptions

Deliverable: Updated processing UI with dual transcript awareness

Phase 5: Testing and Integration (2 hours)

Task 5.1: Unit Tests (1 hour)

Backend unit tests (30 min)

# backend/tests/unit/test_whisper_transcript_service.py
def test_whisper_transcription_accuracy()
def test_dual_transcript_comparison() 
def test_automatic_fallback()
def test_processing_time_estimation()

Frontend unit tests (20 min)

# frontend/src/components/__tests__/TranscriptSelector.test.tsx
describe('TranscriptSelector', () => {
  test('shows processing time estimates')
  test('handles source selection')
  test('displays quality indicators')
})

API endpoint tests (10 min)
- Test dual transcript endpoint with all source options
- Test transcript option validation
- Test error handling scenarios

Deliverable: >80% test coverage for new dual transcript functionality

Task 5.2: Integration Testing (1 hour)

YouTube vs Whisper comparison test (20 min)
- Process same video with both methods
- Verify quality differences are meaningful
- Confirm timing accuracy and word differences
Admin page testing (15 min)
- Test transcript selector in admin interface
- Verify no authentication required for admin access
- Test all transcript source options work without login
Error scenario testing (15 min)
- Test unavailable YouTube captions (automatic fallback to Whisper)
- Test Whisper processing failure (graceful error handling)
- Test long video processing (chunking and timeouts)
Performance testing (10 min)
- Benchmark Whisper processing times for different video lengths
- Test parallel processing performance
- Verify cache effectiveness and hit rates

Deliverable: All integration scenarios passing with documented test results

Acceptance Criteria Validation

AC 1: Transcript Source Selection UI ✅

Three clear choices visible: YouTube Captions, AI Whisper, Compare Both
Processing time estimates shown for each option
Quality level indicators clearly displayed
Icons and badges provide clear visual differentiation

AC 2: YouTube Transcript Processing (Default) ✅

YouTube Captions option processes in under 5 seconds
Transcript source marked as "youtube" in database
Quality score calculated based on caption availability
Existing functionality maintained for backward compatibility

AC 3: Whisper Transcript Processing ✅

AI Whisper option uses integrated TranscriptionService
Audio downloaded using existing VideoDownloadService
High-quality transcript returned with timestamps
Transcript source marked as "whisper" in database
Processing time communicated clearly to user

AC 4: Dual Transcript Comparison ✅

Side-by-side transcript comparison interface
Differences highlighted (word accuracy, punctuation, technical terms)
Quality metrics shown for each transcript
User can switch between transcripts for summary generation

AC 5: Automatic Fallback ✅

System automatically falls back to Whisper when YouTube captions unavailable
User notified of fallback with processing time estimate
Final result shows "whisper" as source method
No manual intervention required

AC 6: Quality and Cost Transparency ✅

Clear processing time estimates (YouTube: 2-5s, Whisper: 30-120s)
Quality indicators (YouTube: "Standard", Whisper: "High Accuracy")
Availability status clearly communicated
Cost implications transparently displayed

Success Metrics Tracking

User Understanding: 80%+ users understand transcript option differences
Feature Adoption: 30%+ of users try Whisper transcription option
Quality Improvement: 25%+ improvement in transcript accuracy using Whisper
User Satisfaction: <5% user complaints about transcript quality
Reliability: Zero failed transcriptions due to unavailable YouTube captions

Risk Mitigation Checklist

High Risk Items

Processing Time Management: Clear time estimates and progress indicators implemented
Resource Consumption: Processing queue and throttling mechanisms in place
Model Download: Pre-downloaded models in Docker image or graceful download handling
Audio Quality: Preprocessing and quality checks implemented

Quality Assurance

All error scenarios tested and handled gracefully
Performance benchmarks met (Whisper <2 minutes for 10-minute video)
Memory usage stays within acceptable limits (<2GB peak)
Cache effectiveness verified and optimized

Definition of Done

Story 4.1 is complete when:

All tasks and subtasks marked complete above
All acceptance criteria validated ✅
Unit tests passing with >80% coverage
Integration tests passing for all scenarios
Performance benchmarks met
Documentation updated
Code review completed
Admin page supports dual transcript options
Production deployment checklist complete

Post-Implementation Tasks

Monitoring Setup

Add metrics for transcript source usage patterns
Monitor Whisper processing times and success rates
Track user satisfaction with transcript quality
Log resource usage patterns for optimization

Documentation Updates

Update API documentation with new endpoints
Add user guide for transcript options
Document deployment requirements (FFmpeg, model caching)
Update troubleshooting guide

Implementation Owner: Development Team
Reviewers: Technical Lead, Product Owner
Epic: Epic 4 - Advanced Intelligence & Developer Platform
Status: Ready for Implementation
Last Updated: 2025-08-27

This unified checklist provides a comprehensive roadmap combining both the story requirements and detailed implementation plan into a single, actionable development workflow.

16 KiB Raw Permalink Blame History