youtube-summarizer/docs/implementation/CLASS_LIBRARY_ANALYSIS.md

12 KiB

Class Library Analysis: Dual Transcript Implementation

Overview

This analysis examines the classes needed for dual transcript functionality and determines what should be created, updated, or moved to the shared class library (/lib/) versus remaining project-specific.

Current Architecture Analysis

Existing Transcript Classes (YouTube Summarizer)

Location: apps/youtube-summarizer/backend/models/transcript.py

class ExtractionMethod(str, Enum):
    YOUTUBE_API = "youtube_api"
    AUTO_CAPTIONS = "auto_captions" 
    WHISPER_AUDIO = "whisper_audio"    # ✅ Already supports Whisper
    MOCK = "mock"
    FAILED = "failed"

class TranscriptSegment(BaseModel):        # ✅ Basic segment model
    text: str
    start: float
    duration: float

class TranscriptMetadata(BaseModel):       # ✅ Comprehensive metadata
    word_count: int
    language: str
    has_timestamps: bool
    extraction_method: ExtractionMethod
    processing_time_seconds: float

class TranscriptResult(BaseModel):         # ✅ Main result structure
    video_id: str
    transcript: Optional[str]
    segments: Optional[List[TranscriptSegment]]
    metadata: Optional[TranscriptMetadata]
    method: ExtractionMethod
    success: bool
    from_cache: bool
    error: Optional[Dict[str, Any]]

Archived Production-Ready Classes

Location: archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py

class TranscriptionSegment:                # 🔄 More detailed than current
    """Enhanced segment with confidence and speaker detection"""
    start_time: float
    end_time: float
    text: str
    confidence: float
    speaker: Optional[str]

class TranscriptionService:                # 🏆 Production-ready Whisper service
    """Full Whisper implementation with chunking, device detection"""
    - Model management (tiny, base, small, medium, large)
    - GPU/CPU auto-detection
    - Audio preprocessing and chunking
    - Speaker identification
    - Confidence scoring
    - Progress callbacks

Current Service Classes

Location: apps/youtube-summarizer/backend/services/

class MockWhisperService:                  # ❌ Placeholder to be replaced
    """Mock implementation returning fake transcripts"""

class EnhancedTranscriptService:           # ✅ Good architecture
    """Orchestrates YouTube API + local files + Whisper fallback"""
    - Priority-based extraction
    - Caching integration
    - Error handling and fallbacks

Required New Classes for Dual Transcript Implementation

1. Backend Service Classes (Project-Specific)

WhisperTranscriptService

# Location: apps/youtube-summarizer/backend/services/whisper_transcript_service.py
class WhisperTranscriptService:
    """Real Whisper service adapted from archived TranscriptionService"""
    - Copy from archived project
    - Remove podcast-specific dependencies
    - Add async compatibility
    - YouTube video context integration
    
# Decision: Project-specific (YouTube video context)

DualTranscriptService

# Location: apps/youtube-summarizer/backend/services/dual_transcript_service.py
class DualTranscriptService(EnhancedTranscriptService):
    """Manages parallel YouTube + Whisper transcript extraction"""
    - Parallel processing with asyncio.gather()
    - Quality comparison algorithms
    - Dual caching strategies
    - Fallback management
    
# Decision: Project-specific (YouTube pipeline integration)

TranscriptQualityAnalyzer

# Location: lib/services/ai/transcript_quality_analyzer.py  
class TranscriptQualityAnalyzer:
    """Generic transcript quality comparison utilities"""
    - Word-level difference analysis
    - Confidence scoring algorithms
    - Timing precision analysis
    - Quality metrics calculation
    
# Decision: Move to class library (reusable across projects)

2. API Model Classes (Project-Specific)

TranscriptOptionsRequest

# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
class TranscriptOptionsRequest(BaseModel):
    source: Literal['youtube', 'whisper', 'both'] = 'youtube'
    whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small'
    language: str = 'en'
    include_timestamps: bool = True
    quality_threshold: float = 0.7

# Decision: Project-specific (YouTube Summarizer API contract)

TranscriptComparisonResponse

# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
class TranscriptComparisonResponse(BaseModel):
    video_id: str
    transcripts: Dict[str, TranscriptResult]  # {'youtube': result, 'whisper': result}
    quality_comparison: Dict[str, Any]
    recommended_source: str
    processing_time_comparison: Dict[str, float]

# Decision: Project-specific (specific to dual transcript feature)

3. Frontend Components (Project-Specific)

// Location: frontend/src/components/TranscriptSelector.tsx
interface TranscriptSelectorProps {
    value: TranscriptSource
    onChange: (source: TranscriptSource) => void
    estimatedDuration?: number
    disabled?: boolean
}

// Location: frontend/src/components/TranscriptComparison.tsx  
interface TranscriptComparisonProps {
    youtubeTranscript: TranscriptResult
    whisperTranscript: TranscriptResult
    onSelectTranscript: (source: TranscriptSource) => void
}

// Location: frontend/src/types/transcript.ts
type TranscriptSource = 'youtube' | 'whisper' | 'both'

# Decision: All project-specific (YouTube Summarizer UI)

Class Library Recommendations

Move to Shared Library (/lib/)

1. Base Transcript Interfaces

# Location: lib/core/interfaces/transcript_interfaces.py
from abc import ABC, abstractmethod
from typing import List, Dict, Any, Optional

class ITranscriptService(ABC):
    """Base interface for transcript services"""
    @abstractmethod
    async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
        pass

class IQualityAnalyzer(ABC):
    """Base interface for transcript quality analysis"""
    @abstractmethod
    def compare_transcripts(self, transcript1: str, transcript2: str) -> Dict[str, Any]:
        pass

class ITranscriptCache(ABC):
    """Base interface for transcript caching"""
    @abstractmethod
    async def get_transcript(self, key: str) -> Optional[Dict[str, Any]]:
        pass

2. Quality Analysis Utilities

# Location: lib/utils/helpers/transcript_analyzer.py
class TranscriptQualityAnalyzer:
    """Reusable transcript quality comparison utilities"""
    
    def calculate_word_accuracy(self, reference: str, candidate: str) -> float:
        """Calculate word-level accuracy between transcripts"""
        
    def analyze_timing_precision(self, segments1: List, segments2: List) -> Dict:
        """Compare timestamp accuracy between segment lists"""
        
    def generate_diff_highlights(self, text1: str, text2: str) -> Dict:
        """Generate visual diff highlighting for UI display"""
        
    def calculate_quality_score(self, transcript: str, segments: List) -> float:
        """Calculate overall quality score based on multiple metrics"""

3. Base Whisper Service Interface

# Location: lib/services/ai/base_whisper_service.py
from lib.core.interfaces.transcript_interfaces import ITranscriptService

class BaseWhisperService(ITranscriptService):
    """Base Whisper service implementation that can be extended"""
    
    def __init__(self, model_size: str = "small", device: str = "auto"):
        self.model_size = model_size
        self.device = self._detect_device(device)
        self.model = None
    
    def _detect_device(self, preferred: str) -> str:
        """Generic device detection logic"""
        
    def _load_model(self):
        """Generic model loading with caching"""
        
    async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
        """Base transcription implementation"""

4. Enhanced Segment Model

# Location: lib/data/models/transcript_models.py
from lib.data.models.base_model import BaseModel

class EnhancedTranscriptSegment(BaseModel):
    """Enhanced segment model combining both architectures"""
    text: str
    start_time: float
    end_time: float
    duration: float  # Calculated property
    confidence: Optional[float] = None
    speaker: Optional[str] = None
    word_count: Optional[int] = None
    
    @property
    def duration(self) -> float:
        return self.end_time - self.start_time
        
    def to_basic_segment(self) -> 'TranscriptSegment':
        """Convert to project-specific basic segment"""
        return TranscriptSegment(
            text=self.text,
            start=self.start_time, 
            duration=self.duration
        )

Keep Project-Specific

1. YouTube-Specific Models

All current models in apps/youtube-summarizer/backend/models/transcript.py remain project-specific as they're tied to YouTube Summarizer's specific API contracts and domain.

2. Service Implementations

  • WhisperTranscriptService - YouTube video context
  • DualTranscriptService - YouTube pipeline integration
  • EnhancedTranscriptService - YouTube-specific orchestration

3. API Endpoints and Controllers

All API-related classes stay project-specific.

4. Frontend Components

All React components and TypeScript types stay project-specific.

Implementation Strategy

Phase 1: Create Class Library Base Components

# Create shared interfaces and utilities
mkdir -p lib/core/interfaces
mkdir -p lib/services/ai  
mkdir -p lib/utils/helpers
mkdir -p lib/data/models

# Copy and adapt quality analysis utilities
# Create base interfaces for transcript services
# Create enhanced segment model

Phase 2: Implement Project-Specific Classes

# Create WhisperTranscriptService (adapt from archived)
# Create DualTranscriptService (extend EnhancedTranscriptService)
# Extend existing transcript models with new API models
# Create frontend components

Phase 3: Integration and Testing

# Update dependency injection to use new services
# Update existing services to use class library utilities
# Add comprehensive tests for both shared and project-specific code

Benefits of This Approach

Shared Class Library Benefits

  1. Reusability - Quality analysis and base Whisper service can be used across projects
  2. Consistency - Standard interfaces ensure consistent behavior
  3. Maintainability - Core algorithms centralized for easier updates
  4. Testing - Shared utilities can have comprehensive test coverage

Project-Specific Benefits

  1. Flexibility - YouTube-specific models can evolve independently
  2. Performance - No unnecessary abstraction overhead
  3. Domain Clarity - Clear separation between generic and YouTube-specific logic
  4. Deployment - No shared library versioning concerns

Migration Path

Immediate (Story 4.1)

  • Implement all classes as project-specific first
  • Get dual transcript feature working end-to-end
  • Validate architecture and patterns

Future Refactoring (Post-Story 4.1)

  • Extract proven quality analysis utilities to class library
  • Move base Whisper interfaces to shared location
  • Update other projects to use shared transcript utilities

Conclusion

For Story 4.1 implementation, focus on project-specific classes to maintain velocity and ensure the feature works correctly. The class library extraction can be done as a follow-up refactoring once the implementation is proven and stable.

Key Classes to Create:

  1. WhisperTranscriptService (project-specific)
  2. DualTranscriptService (project-specific)
  3. TranscriptOptionsRequest/Response (extend existing models)
  4. TranscriptSelector/Comparison (React components)

Future Class Library Candidates:

  1. 🔄 TranscriptQualityAnalyzer utilities
  2. 🔄 BaseWhisperService interface
  3. 🔄 EnhancedTranscriptSegment model

This approach balances immediate implementation needs with future reusability and maintainability.