12 KiB

Raw Blame History

Class Library Analysis: Dual Transcript Implementation

Overview

This analysis examines the classes needed for dual transcript functionality and determines what should be created, updated, or moved to the shared class library (/lib/) versus remaining project-specific.

Current Architecture Analysis

Existing Transcript Classes (YouTube Summarizer)

Location: apps/youtube-summarizer/backend/models/transcript.py

class ExtractionMethod(str, Enum):
    YOUTUBE_API = "youtube_api"
    AUTO_CAPTIONS = "auto_captions" 
    WHISPER_AUDIO = "whisper_audio"    # ✅ Already supports Whisper
    MOCK = "mock"
    FAILED = "failed"

class TranscriptSegment(BaseModel):        # ✅ Basic segment model
    text: str
    start: float
    duration: float

class TranscriptMetadata(BaseModel):       # ✅ Comprehensive metadata
    word_count: int
    language: str
    has_timestamps: bool
    extraction_method: ExtractionMethod
    processing_time_seconds: float

class TranscriptResult(BaseModel):         # ✅ Main result structure
    video_id: str
    transcript: Optional[str]
    segments: Optional[List[TranscriptSegment]]
    metadata: Optional[TranscriptMetadata]
    method: ExtractionMethod
    success: bool
    from_cache: bool
    error: Optional[Dict[str, Any]]

Archived Production-Ready Classes

Location: archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py

class TranscriptionSegment:                # 🔄 More detailed than current
    """Enhanced segment with confidence and speaker detection"""
    start_time: float
    end_time: float
    text: str
    confidence: float
    speaker: Optional[str]

class TranscriptionService:                # 🏆 Production-ready Whisper service
    """Full Whisper implementation with chunking, device detection"""
    - Model management (tiny, base, small, medium, large)
    - GPU/CPU auto-detection
    - Audio preprocessing and chunking
    - Speaker identification
    - Confidence scoring
    - Progress callbacks

Current Service Classes

Location: apps/youtube-summarizer/backend/services/

class MockWhisperService:                  # ❌ Placeholder to be replaced
    """Mock implementation returning fake transcripts"""

class EnhancedTranscriptService:           # ✅ Good architecture
    """Orchestrates YouTube API + local files + Whisper fallback"""
    - Priority-based extraction
    - Caching integration
    - Error handling and fallbacks

Required New Classes for Dual Transcript Implementation

1. Backend Service Classes (Project-Specific)

WhisperTranscriptService

# Location: apps/youtube-summarizer/backend/services/whisper_transcript_service.py
class WhisperTranscriptService:
    """Real Whisper service adapted from archived TranscriptionService"""
    - Copy from archived project
    - Remove podcast-specific dependencies
    - Add async compatibility
    - YouTube video context integration
    
# Decision: Project-specific (YouTube video context)

DualTranscriptService

# Location: apps/youtube-summarizer/backend/services/dual_transcript_service.py
class DualTranscriptService(EnhancedTranscriptService):
    """Manages parallel YouTube + Whisper transcript extraction"""
    - Parallel processing with asyncio.gather()
    - Quality comparison algorithms
    - Dual caching strategies
    - Fallback management
    
# Decision: Project-specific (YouTube pipeline integration)

TranscriptQualityAnalyzer

# Location: lib/services/ai/transcript_quality_analyzer.py  
class TranscriptQualityAnalyzer:
    """Generic transcript quality comparison utilities"""
    - Word-level difference analysis
    - Confidence scoring algorithms
    - Timing precision analysis
    - Quality metrics calculation
    
# Decision: Move to class library (reusable across projects)

2. API Model Classes (Project-Specific)

TranscriptOptionsRequest

# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
class TranscriptOptionsRequest(BaseModel):
    source: Literal['youtube', 'whisper', 'both'] = 'youtube'
    whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small'
    language: str = 'en'
    include_timestamps: bool = True
    quality_threshold: float = 0.7

# Decision: Project-specific (YouTube Summarizer API contract)

TranscriptComparisonResponse

# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
class TranscriptComparisonResponse(BaseModel):
    video_id: str
    transcripts: Dict[str, TranscriptResult]  # {'youtube': result, 'whisper': result}
    quality_comparison: Dict[str, Any]
    recommended_source: str
    processing_time_comparison: Dict[str, float]

# Decision: Project-specific (specific to dual transcript feature)

3. Frontend Components (Project-Specific)

// Location: frontend/src/components/TranscriptSelector.tsx
interface TranscriptSelectorProps {
    value: TranscriptSource
    onChange: (source: TranscriptSource) => void
    estimatedDuration?: number
    disabled?: boolean
}

// Location: frontend/src/components/TranscriptComparison.tsx  
interface TranscriptComparisonProps {
    youtubeTranscript: TranscriptResult
    whisperTranscript: TranscriptResult
    onSelectTranscript: (source: TranscriptSource) => void
}

// Location: frontend/src/types/transcript.ts
type TranscriptSource = 'youtube' | 'whisper' | 'both'

# Decision: All project-specific (YouTube Summarizer UI)

Class Library Recommendations

Move to Shared Library (`/lib/`)

1. Base Transcript Interfaces

# Location: lib/core/interfaces/transcript_interfaces.py
from abc import ABC, abstractmethod
from typing import List, Dict, Any, Optional

class ITranscriptService(ABC):
    """Base interface for transcript services"""
    @abstractmethod
    async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
        pass

class IQualityAnalyzer(ABC):
    """Base interface for transcript quality analysis"""
    @abstractmethod
    def compare_transcripts(self, transcript1: str, transcript2: str) -> Dict[str, Any]:
        pass

class ITranscriptCache(ABC):
    """Base interface for transcript caching"""
    @abstractmethod
    async def get_transcript(self, key: str) -> Optional[Dict[str, Any]]:
        pass

2. Quality Analysis Utilities

# Location: lib/utils/helpers/transcript_analyzer.py
class TranscriptQualityAnalyzer:
    """Reusable transcript quality comparison utilities"""
    
    def calculate_word_accuracy(self, reference: str, candidate: str) -> float:
        """Calculate word-level accuracy between transcripts"""
        
    def analyze_timing_precision(self, segments1: List, segments2: List) -> Dict:
        """Compare timestamp accuracy between segment lists"""
        
    def generate_diff_highlights(self, text1: str, text2: str) -> Dict:
        """Generate visual diff highlighting for UI display"""
        
    def calculate_quality_score(self, transcript: str, segments: List) -> float:
        """Calculate overall quality score based on multiple metrics"""

3. Base Whisper Service Interface

# Location: lib/services/ai/base_whisper_service.py
from lib.core.interfaces.transcript_interfaces import ITranscriptService

class BaseWhisperService(ITranscriptService):
    """Base Whisper service implementation that can be extended"""
    
    def __init__(self, model_size: str = "small", device: str = "auto"):
        self.model_size = model_size
        self.device = self._detect_device(device)
        self.model = None
    
    def _detect_device(self, preferred: str) -> str:
        """Generic device detection logic"""
        
    def _load_model(self):
        """Generic model loading with caching"""
        
    async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
        """Base transcription implementation"""

4. Enhanced Segment Model

# Location: lib/data/models/transcript_models.py
from lib.data.models.base_model import BaseModel

class EnhancedTranscriptSegment(BaseModel):
    """Enhanced segment model combining both architectures"""
    text: str
    start_time: float
    end_time: float
    duration: float  # Calculated property
    confidence: Optional[float] = None
    speaker: Optional[str] = None
    word_count: Optional[int] = None
    
    @property
    def duration(self) -> float:
        return self.end_time - self.start_time
        
    def to_basic_segment(self) -> 'TranscriptSegment':
        """Convert to project-specific basic segment"""
        return TranscriptSegment(
            text=self.text,
            start=self.start_time, 
            duration=self.duration
        )

Keep Project-Specific

1. YouTube-Specific Models

All current models in apps/youtube-summarizer/backend/models/transcript.py remain project-specific as they're tied to YouTube Summarizer's specific API contracts and domain.

2. Service Implementations

WhisperTranscriptService - YouTube video context
DualTranscriptService - YouTube pipeline integration
EnhancedTranscriptService - YouTube-specific orchestration

3. API Endpoints and Controllers

All API-related classes stay project-specific.

4. Frontend Components

All React components and TypeScript types stay project-specific.

Implementation Strategy

Phase 1: Create Class Library Base Components

# Create shared interfaces and utilities
mkdir -p lib/core/interfaces
mkdir -p lib/services/ai  
mkdir -p lib/utils/helpers
mkdir -p lib/data/models

# Copy and adapt quality analysis utilities
# Create base interfaces for transcript services
# Create enhanced segment model

Phase 2: Implement Project-Specific Classes

# Create WhisperTranscriptService (adapt from archived)
# Create DualTranscriptService (extend EnhancedTranscriptService)
# Extend existing transcript models with new API models
# Create frontend components

Phase 3: Integration and Testing

# Update dependency injection to use new services
# Update existing services to use class library utilities
# Add comprehensive tests for both shared and project-specific code

Benefits of This Approach

Shared Class Library Benefits

Reusability - Quality analysis and base Whisper service can be used across projects
Consistency - Standard interfaces ensure consistent behavior
Maintainability - Core algorithms centralized for easier updates
Testing - Shared utilities can have comprehensive test coverage

Project-Specific Benefits

Flexibility - YouTube-specific models can evolve independently
Performance - No unnecessary abstraction overhead
Domain Clarity - Clear separation between generic and YouTube-specific logic
Deployment - No shared library versioning concerns

Migration Path

Immediate (Story 4.1)

Implement all classes as project-specific first
Get dual transcript feature working end-to-end
Validate architecture and patterns

Future Refactoring (Post-Story 4.1)

Extract proven quality analysis utilities to class library
Move base Whisper interfaces to shared location
Update other projects to use shared transcript utilities

Conclusion

For Story 4.1 implementation, focus on project-specific classes to maintain velocity and ensure the feature works correctly. The class library extraction can be done as a follow-up refactoring once the implementation is proven and stable.

Key Classes to Create:

✅ WhisperTranscriptService (project-specific)
✅ DualTranscriptService (project-specific)
✅ TranscriptOptionsRequest/Response (extend existing models)
✅ TranscriptSelector/Comparison (React components)

Future Class Library Candidates:

🔄 TranscriptQualityAnalyzer utilities
🔄 BaseWhisperService interface
🔄 EnhancedTranscriptSegment model

This approach balances immediate implementation needs with future reusability and maintainability.

12 KiB Raw Blame History