12 KiB
Class Library Analysis: Dual Transcript Implementation
Overview
This analysis examines the classes needed for dual transcript functionality and determines what should be created, updated, or moved to the shared class library (/lib/) versus remaining project-specific.
Current Architecture Analysis
Existing Transcript Classes (YouTube Summarizer)
Location: apps/youtube-summarizer/backend/models/transcript.py
class ExtractionMethod(str, Enum):
YOUTUBE_API = "youtube_api"
AUTO_CAPTIONS = "auto_captions"
WHISPER_AUDIO = "whisper_audio" # ✅ Already supports Whisper
MOCK = "mock"
FAILED = "failed"
class TranscriptSegment(BaseModel): # ✅ Basic segment model
text: str
start: float
duration: float
class TranscriptMetadata(BaseModel): # ✅ Comprehensive metadata
word_count: int
language: str
has_timestamps: bool
extraction_method: ExtractionMethod
processing_time_seconds: float
class TranscriptResult(BaseModel): # ✅ Main result structure
video_id: str
transcript: Optional[str]
segments: Optional[List[TranscriptSegment]]
metadata: Optional[TranscriptMetadata]
method: ExtractionMethod
success: bool
from_cache: bool
error: Optional[Dict[str, Any]]
Archived Production-Ready Classes
Location: archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py
class TranscriptionSegment: # 🔄 More detailed than current
"""Enhanced segment with confidence and speaker detection"""
start_time: float
end_time: float
text: str
confidence: float
speaker: Optional[str]
class TranscriptionService: # 🏆 Production-ready Whisper service
"""Full Whisper implementation with chunking, device detection"""
- Model management (tiny, base, small, medium, large)
- GPU/CPU auto-detection
- Audio preprocessing and chunking
- Speaker identification
- Confidence scoring
- Progress callbacks
Current Service Classes
Location: apps/youtube-summarizer/backend/services/
class MockWhisperService: # ❌ Placeholder to be replaced
"""Mock implementation returning fake transcripts"""
class EnhancedTranscriptService: # ✅ Good architecture
"""Orchestrates YouTube API + local files + Whisper fallback"""
- Priority-based extraction
- Caching integration
- Error handling and fallbacks
Required New Classes for Dual Transcript Implementation
1. Backend Service Classes (Project-Specific)
WhisperTranscriptService
# Location: apps/youtube-summarizer/backend/services/whisper_transcript_service.py
class WhisperTranscriptService:
"""Real Whisper service adapted from archived TranscriptionService"""
- Copy from archived project
- Remove podcast-specific dependencies
- Add async compatibility
- YouTube video context integration
# Decision: Project-specific (YouTube video context)
DualTranscriptService
# Location: apps/youtube-summarizer/backend/services/dual_transcript_service.py
class DualTranscriptService(EnhancedTranscriptService):
"""Manages parallel YouTube + Whisper transcript extraction"""
- Parallel processing with asyncio.gather()
- Quality comparison algorithms
- Dual caching strategies
- Fallback management
# Decision: Project-specific (YouTube pipeline integration)
TranscriptQualityAnalyzer
# Location: lib/services/ai/transcript_quality_analyzer.py
class TranscriptQualityAnalyzer:
"""Generic transcript quality comparison utilities"""
- Word-level difference analysis
- Confidence scoring algorithms
- Timing precision analysis
- Quality metrics calculation
# Decision: Move to class library (reusable across projects)
2. API Model Classes (Project-Specific)
TranscriptOptionsRequest
# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
class TranscriptOptionsRequest(BaseModel):
source: Literal['youtube', 'whisper', 'both'] = 'youtube'
whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small'
language: str = 'en'
include_timestamps: bool = True
quality_threshold: float = 0.7
# Decision: Project-specific (YouTube Summarizer API contract)
TranscriptComparisonResponse
# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
class TranscriptComparisonResponse(BaseModel):
video_id: str
transcripts: Dict[str, TranscriptResult] # {'youtube': result, 'whisper': result}
quality_comparison: Dict[str, Any]
recommended_source: str
processing_time_comparison: Dict[str, float]
# Decision: Project-specific (specific to dual transcript feature)
3. Frontend Components (Project-Specific)
// Location: frontend/src/components/TranscriptSelector.tsx
interface TranscriptSelectorProps {
value: TranscriptSource
onChange: (source: TranscriptSource) => void
estimatedDuration?: number
disabled?: boolean
}
// Location: frontend/src/components/TranscriptComparison.tsx
interface TranscriptComparisonProps {
youtubeTranscript: TranscriptResult
whisperTranscript: TranscriptResult
onSelectTranscript: (source: TranscriptSource) => void
}
// Location: frontend/src/types/transcript.ts
type TranscriptSource = 'youtube' | 'whisper' | 'both'
# Decision: All project-specific (YouTube Summarizer UI)
Class Library Recommendations
Move to Shared Library (/lib/)
1. Base Transcript Interfaces
# Location: lib/core/interfaces/transcript_interfaces.py
from abc import ABC, abstractmethod
from typing import List, Dict, Any, Optional
class ITranscriptService(ABC):
"""Base interface for transcript services"""
@abstractmethod
async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
pass
class IQualityAnalyzer(ABC):
"""Base interface for transcript quality analysis"""
@abstractmethod
def compare_transcripts(self, transcript1: str, transcript2: str) -> Dict[str, Any]:
pass
class ITranscriptCache(ABC):
"""Base interface for transcript caching"""
@abstractmethod
async def get_transcript(self, key: str) -> Optional[Dict[str, Any]]:
pass
2. Quality Analysis Utilities
# Location: lib/utils/helpers/transcript_analyzer.py
class TranscriptQualityAnalyzer:
"""Reusable transcript quality comparison utilities"""
def calculate_word_accuracy(self, reference: str, candidate: str) -> float:
"""Calculate word-level accuracy between transcripts"""
def analyze_timing_precision(self, segments1: List, segments2: List) -> Dict:
"""Compare timestamp accuracy between segment lists"""
def generate_diff_highlights(self, text1: str, text2: str) -> Dict:
"""Generate visual diff highlighting for UI display"""
def calculate_quality_score(self, transcript: str, segments: List) -> float:
"""Calculate overall quality score based on multiple metrics"""
3. Base Whisper Service Interface
# Location: lib/services/ai/base_whisper_service.py
from lib.core.interfaces.transcript_interfaces import ITranscriptService
class BaseWhisperService(ITranscriptService):
"""Base Whisper service implementation that can be extended"""
def __init__(self, model_size: str = "small", device: str = "auto"):
self.model_size = model_size
self.device = self._detect_device(device)
self.model = None
def _detect_device(self, preferred: str) -> str:
"""Generic device detection logic"""
def _load_model(self):
"""Generic model loading with caching"""
async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
"""Base transcription implementation"""
4. Enhanced Segment Model
# Location: lib/data/models/transcript_models.py
from lib.data.models.base_model import BaseModel
class EnhancedTranscriptSegment(BaseModel):
"""Enhanced segment model combining both architectures"""
text: str
start_time: float
end_time: float
duration: float # Calculated property
confidence: Optional[float] = None
speaker: Optional[str] = None
word_count: Optional[int] = None
@property
def duration(self) -> float:
return self.end_time - self.start_time
def to_basic_segment(self) -> 'TranscriptSegment':
"""Convert to project-specific basic segment"""
return TranscriptSegment(
text=self.text,
start=self.start_time,
duration=self.duration
)
Keep Project-Specific
1. YouTube-Specific Models
All current models in apps/youtube-summarizer/backend/models/transcript.py remain project-specific as they're tied to YouTube Summarizer's specific API contracts and domain.
2. Service Implementations
WhisperTranscriptService- YouTube video contextDualTranscriptService- YouTube pipeline integrationEnhancedTranscriptService- YouTube-specific orchestration
3. API Endpoints and Controllers
All API-related classes stay project-specific.
4. Frontend Components
All React components and TypeScript types stay project-specific.
Implementation Strategy
Phase 1: Create Class Library Base Components
# Create shared interfaces and utilities
mkdir -p lib/core/interfaces
mkdir -p lib/services/ai
mkdir -p lib/utils/helpers
mkdir -p lib/data/models
# Copy and adapt quality analysis utilities
# Create base interfaces for transcript services
# Create enhanced segment model
Phase 2: Implement Project-Specific Classes
# Create WhisperTranscriptService (adapt from archived)
# Create DualTranscriptService (extend EnhancedTranscriptService)
# Extend existing transcript models with new API models
# Create frontend components
Phase 3: Integration and Testing
# Update dependency injection to use new services
# Update existing services to use class library utilities
# Add comprehensive tests for both shared and project-specific code
Benefits of This Approach
Shared Class Library Benefits
- Reusability - Quality analysis and base Whisper service can be used across projects
- Consistency - Standard interfaces ensure consistent behavior
- Maintainability - Core algorithms centralized for easier updates
- Testing - Shared utilities can have comprehensive test coverage
Project-Specific Benefits
- Flexibility - YouTube-specific models can evolve independently
- Performance - No unnecessary abstraction overhead
- Domain Clarity - Clear separation between generic and YouTube-specific logic
- Deployment - No shared library versioning concerns
Migration Path
Immediate (Story 4.1)
- Implement all classes as project-specific first
- Get dual transcript feature working end-to-end
- Validate architecture and patterns
Future Refactoring (Post-Story 4.1)
- Extract proven quality analysis utilities to class library
- Move base Whisper interfaces to shared location
- Update other projects to use shared transcript utilities
Conclusion
For Story 4.1 implementation, focus on project-specific classes to maintain velocity and ensure the feature works correctly. The class library extraction can be done as a follow-up refactoring once the implementation is proven and stable.
Key Classes to Create:
- ✅
WhisperTranscriptService(project-specific) - ✅
DualTranscriptService(project-specific) - ✅
TranscriptOptionsRequest/Response(extend existing models) - ✅
TranscriptSelector/Comparison(React components)
Future Class Library Candidates:
- 🔄
TranscriptQualityAnalyzerutilities - 🔄
BaseWhisperServiceinterface - 🔄
EnhancedTranscriptSegmentmodel
This approach balances immediate implementation needs with future reusability and maintainability.