# Class Library Analysis: Dual Transcript Implementation ## Overview This analysis examines the classes needed for dual transcript functionality and determines what should be created, updated, or moved to the shared class library (`/lib/`) versus remaining project-specific. ## Current Architecture Analysis ### Existing Transcript Classes (YouTube Summarizer) **Location**: `apps/youtube-summarizer/backend/models/transcript.py` ```python class ExtractionMethod(str, Enum): YOUTUBE_API = "youtube_api" AUTO_CAPTIONS = "auto_captions" WHISPER_AUDIO = "whisper_audio" # ✅ Already supports Whisper MOCK = "mock" FAILED = "failed" class TranscriptSegment(BaseModel): # ✅ Basic segment model text: str start: float duration: float class TranscriptMetadata(BaseModel): # ✅ Comprehensive metadata word_count: int language: str has_timestamps: bool extraction_method: ExtractionMethod processing_time_seconds: float class TranscriptResult(BaseModel): # ✅ Main result structure video_id: str transcript: Optional[str] segments: Optional[List[TranscriptSegment]] metadata: Optional[TranscriptMetadata] method: ExtractionMethod success: bool from_cache: bool error: Optional[Dict[str, Any]] ``` ### Archived Production-Ready Classes **Location**: `archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py` ```python class TranscriptionSegment: # 🔄 More detailed than current """Enhanced segment with confidence and speaker detection""" start_time: float end_time: float text: str confidence: float speaker: Optional[str] class TranscriptionService: # 🏆 Production-ready Whisper service """Full Whisper implementation with chunking, device detection""" - Model management (tiny, base, small, medium, large) - GPU/CPU auto-detection - Audio preprocessing and chunking - Speaker identification - Confidence scoring - Progress callbacks ``` ### Current Service Classes **Location**: `apps/youtube-summarizer/backend/services/` ```python class MockWhisperService: # ❌ Placeholder to be replaced """Mock implementation returning fake transcripts""" class EnhancedTranscriptService: # ✅ Good architecture """Orchestrates YouTube API + local files + Whisper fallback""" - Priority-based extraction - Caching integration - Error handling and fallbacks ``` ## Required New Classes for Dual Transcript Implementation ### 1. Backend Service Classes (Project-Specific) #### WhisperTranscriptService ```python # Location: apps/youtube-summarizer/backend/services/whisper_transcript_service.py class WhisperTranscriptService: """Real Whisper service adapted from archived TranscriptionService""" - Copy from archived project - Remove podcast-specific dependencies - Add async compatibility - YouTube video context integration # Decision: Project-specific (YouTube video context) ``` #### DualTranscriptService ```python # Location: apps/youtube-summarizer/backend/services/dual_transcript_service.py class DualTranscriptService(EnhancedTranscriptService): """Manages parallel YouTube + Whisper transcript extraction""" - Parallel processing with asyncio.gather() - Quality comparison algorithms - Dual caching strategies - Fallback management # Decision: Project-specific (YouTube pipeline integration) ``` #### TranscriptQualityAnalyzer ```python # Location: lib/services/ai/transcript_quality_analyzer.py class TranscriptQualityAnalyzer: """Generic transcript quality comparison utilities""" - Word-level difference analysis - Confidence scoring algorithms - Timing precision analysis - Quality metrics calculation # Decision: Move to class library (reusable across projects) ``` ### 2. API Model Classes (Project-Specific) #### TranscriptOptionsRequest ```python # Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing) class TranscriptOptionsRequest(BaseModel): source: Literal['youtube', 'whisper', 'both'] = 'youtube' whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small' language: str = 'en' include_timestamps: bool = True quality_threshold: float = 0.7 # Decision: Project-specific (YouTube Summarizer API contract) ``` #### TranscriptComparisonResponse ```python # Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing) class TranscriptComparisonResponse(BaseModel): video_id: str transcripts: Dict[str, TranscriptResult] # {'youtube': result, 'whisper': result} quality_comparison: Dict[str, Any] recommended_source: str processing_time_comparison: Dict[str, float] # Decision: Project-specific (specific to dual transcript feature) ``` ### 3. Frontend Components (Project-Specific) ```typescript // Location: frontend/src/components/TranscriptSelector.tsx interface TranscriptSelectorProps { value: TranscriptSource onChange: (source: TranscriptSource) => void estimatedDuration?: number disabled?: boolean } // Location: frontend/src/components/TranscriptComparison.tsx interface TranscriptComparisonProps { youtubeTranscript: TranscriptResult whisperTranscript: TranscriptResult onSelectTranscript: (source: TranscriptSource) => void } // Location: frontend/src/types/transcript.ts type TranscriptSource = 'youtube' | 'whisper' | 'both' # Decision: All project-specific (YouTube Summarizer UI) ``` ## Class Library Recommendations ### Move to Shared Library (`/lib/`) #### 1. Base Transcript Interfaces ```python # Location: lib/core/interfaces/transcript_interfaces.py from abc import ABC, abstractmethod from typing import List, Dict, Any, Optional class ITranscriptService(ABC): """Base interface for transcript services""" @abstractmethod async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]: pass class IQualityAnalyzer(ABC): """Base interface for transcript quality analysis""" @abstractmethod def compare_transcripts(self, transcript1: str, transcript2: str) -> Dict[str, Any]: pass class ITranscriptCache(ABC): """Base interface for transcript caching""" @abstractmethod async def get_transcript(self, key: str) -> Optional[Dict[str, Any]]: pass ``` #### 2. Quality Analysis Utilities ```python # Location: lib/utils/helpers/transcript_analyzer.py class TranscriptQualityAnalyzer: """Reusable transcript quality comparison utilities""" def calculate_word_accuracy(self, reference: str, candidate: str) -> float: """Calculate word-level accuracy between transcripts""" def analyze_timing_precision(self, segments1: List, segments2: List) -> Dict: """Compare timestamp accuracy between segment lists""" def generate_diff_highlights(self, text1: str, text2: str) -> Dict: """Generate visual diff highlighting for UI display""" def calculate_quality_score(self, transcript: str, segments: List) -> float: """Calculate overall quality score based on multiple metrics""" ``` #### 3. Base Whisper Service Interface ```python # Location: lib/services/ai/base_whisper_service.py from lib.core.interfaces.transcript_interfaces import ITranscriptService class BaseWhisperService(ITranscriptService): """Base Whisper service implementation that can be extended""" def __init__(self, model_size: str = "small", device: str = "auto"): self.model_size = model_size self.device = self._detect_device(device) self.model = None def _detect_device(self, preferred: str) -> str: """Generic device detection logic""" def _load_model(self): """Generic model loading with caching""" async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]: """Base transcription implementation""" ``` #### 4. Enhanced Segment Model ```python # Location: lib/data/models/transcript_models.py from lib.data.models.base_model import BaseModel class EnhancedTranscriptSegment(BaseModel): """Enhanced segment model combining both architectures""" text: str start_time: float end_time: float duration: float # Calculated property confidence: Optional[float] = None speaker: Optional[str] = None word_count: Optional[int] = None @property def duration(self) -> float: return self.end_time - self.start_time def to_basic_segment(self) -> 'TranscriptSegment': """Convert to project-specific basic segment""" return TranscriptSegment( text=self.text, start=self.start_time, duration=self.duration ) ``` ### Keep Project-Specific #### 1. YouTube-Specific Models All current models in `apps/youtube-summarizer/backend/models/transcript.py` remain project-specific as they're tied to YouTube Summarizer's specific API contracts and domain. #### 2. Service Implementations - `WhisperTranscriptService` - YouTube video context - `DualTranscriptService` - YouTube pipeline integration - `EnhancedTranscriptService` - YouTube-specific orchestration #### 3. API Endpoints and Controllers All API-related classes stay project-specific. #### 4. Frontend Components All React components and TypeScript types stay project-specific. ## Implementation Strategy ### Phase 1: Create Class Library Base Components ```bash # Create shared interfaces and utilities mkdir -p lib/core/interfaces mkdir -p lib/services/ai mkdir -p lib/utils/helpers mkdir -p lib/data/models # Copy and adapt quality analysis utilities # Create base interfaces for transcript services # Create enhanced segment model ``` ### Phase 2: Implement Project-Specific Classes ```bash # Create WhisperTranscriptService (adapt from archived) # Create DualTranscriptService (extend EnhancedTranscriptService) # Extend existing transcript models with new API models # Create frontend components ``` ### Phase 3: Integration and Testing ```bash # Update dependency injection to use new services # Update existing services to use class library utilities # Add comprehensive tests for both shared and project-specific code ``` ## Benefits of This Approach ### Shared Class Library Benefits 1. **Reusability** - Quality analysis and base Whisper service can be used across projects 2. **Consistency** - Standard interfaces ensure consistent behavior 3. **Maintainability** - Core algorithms centralized for easier updates 4. **Testing** - Shared utilities can have comprehensive test coverage ### Project-Specific Benefits 1. **Flexibility** - YouTube-specific models can evolve independently 2. **Performance** - No unnecessary abstraction overhead 3. **Domain Clarity** - Clear separation between generic and YouTube-specific logic 4. **Deployment** - No shared library versioning concerns ## Migration Path ### Immediate (Story 4.1) - Implement all classes as project-specific first - Get dual transcript feature working end-to-end - Validate architecture and patterns ### Future Refactoring (Post-Story 4.1) - Extract proven quality analysis utilities to class library - Move base Whisper interfaces to shared location - Update other projects to use shared transcript utilities ## Conclusion For Story 4.1 implementation, focus on **project-specific classes** to maintain velocity and ensure the feature works correctly. The class library extraction can be done as a follow-up refactoring once the implementation is proven and stable. **Key Classes to Create:** 1. ✅ `WhisperTranscriptService` (project-specific) 2. ✅ `DualTranscriptService` (project-specific) 3. ✅ `TranscriptOptionsRequest/Response` (extend existing models) 4. ✅ `TranscriptSelector/Comparison` (React components) **Future Class Library Candidates:** 1. 🔄 `TranscriptQualityAnalyzer` utilities 2. 🔄 `BaseWhisperService` interface 3. 🔄 `EnhancedTranscriptSegment` model This approach balances immediate implementation needs with future reusability and maintainability.