363 lines
12 KiB
Markdown
363 lines
12 KiB
Markdown
# Class Library Analysis: Dual Transcript Implementation
|
|
|
|
## Overview
|
|
|
|
This analysis examines the classes needed for dual transcript functionality and determines what should be created, updated, or moved to the shared class library (`/lib/`) versus remaining project-specific.
|
|
|
|
## Current Architecture Analysis
|
|
|
|
### Existing Transcript Classes (YouTube Summarizer)
|
|
|
|
**Location**: `apps/youtube-summarizer/backend/models/transcript.py`
|
|
|
|
```python
|
|
class ExtractionMethod(str, Enum):
|
|
YOUTUBE_API = "youtube_api"
|
|
AUTO_CAPTIONS = "auto_captions"
|
|
WHISPER_AUDIO = "whisper_audio" # ✅ Already supports Whisper
|
|
MOCK = "mock"
|
|
FAILED = "failed"
|
|
|
|
class TranscriptSegment(BaseModel): # ✅ Basic segment model
|
|
text: str
|
|
start: float
|
|
duration: float
|
|
|
|
class TranscriptMetadata(BaseModel): # ✅ Comprehensive metadata
|
|
word_count: int
|
|
language: str
|
|
has_timestamps: bool
|
|
extraction_method: ExtractionMethod
|
|
processing_time_seconds: float
|
|
|
|
class TranscriptResult(BaseModel): # ✅ Main result structure
|
|
video_id: str
|
|
transcript: Optional[str]
|
|
segments: Optional[List[TranscriptSegment]]
|
|
metadata: Optional[TranscriptMetadata]
|
|
method: ExtractionMethod
|
|
success: bool
|
|
from_cache: bool
|
|
error: Optional[Dict[str, Any]]
|
|
```
|
|
|
|
### Archived Production-Ready Classes
|
|
|
|
**Location**: `archived_projects/personal-ai-assistant-v1.1.0/src/services/transcription_service.py`
|
|
|
|
```python
|
|
class TranscriptionSegment: # 🔄 More detailed than current
|
|
"""Enhanced segment with confidence and speaker detection"""
|
|
start_time: float
|
|
end_time: float
|
|
text: str
|
|
confidence: float
|
|
speaker: Optional[str]
|
|
|
|
class TranscriptionService: # 🏆 Production-ready Whisper service
|
|
"""Full Whisper implementation with chunking, device detection"""
|
|
- Model management (tiny, base, small, medium, large)
|
|
- GPU/CPU auto-detection
|
|
- Audio preprocessing and chunking
|
|
- Speaker identification
|
|
- Confidence scoring
|
|
- Progress callbacks
|
|
```
|
|
|
|
### Current Service Classes
|
|
|
|
**Location**: `apps/youtube-summarizer/backend/services/`
|
|
|
|
```python
|
|
class MockWhisperService: # ❌ Placeholder to be replaced
|
|
"""Mock implementation returning fake transcripts"""
|
|
|
|
class EnhancedTranscriptService: # ✅ Good architecture
|
|
"""Orchestrates YouTube API + local files + Whisper fallback"""
|
|
- Priority-based extraction
|
|
- Caching integration
|
|
- Error handling and fallbacks
|
|
```
|
|
|
|
## Required New Classes for Dual Transcript Implementation
|
|
|
|
### 1. Backend Service Classes (Project-Specific)
|
|
|
|
#### WhisperTranscriptService
|
|
```python
|
|
# Location: apps/youtube-summarizer/backend/services/whisper_transcript_service.py
|
|
class WhisperTranscriptService:
|
|
"""Real Whisper service adapted from archived TranscriptionService"""
|
|
- Copy from archived project
|
|
- Remove podcast-specific dependencies
|
|
- Add async compatibility
|
|
- YouTube video context integration
|
|
|
|
# Decision: Project-specific (YouTube video context)
|
|
```
|
|
|
|
#### DualTranscriptService
|
|
```python
|
|
# Location: apps/youtube-summarizer/backend/services/dual_transcript_service.py
|
|
class DualTranscriptService(EnhancedTranscriptService):
|
|
"""Manages parallel YouTube + Whisper transcript extraction"""
|
|
- Parallel processing with asyncio.gather()
|
|
- Quality comparison algorithms
|
|
- Dual caching strategies
|
|
- Fallback management
|
|
|
|
# Decision: Project-specific (YouTube pipeline integration)
|
|
```
|
|
|
|
#### TranscriptQualityAnalyzer
|
|
```python
|
|
# Location: lib/services/ai/transcript_quality_analyzer.py
|
|
class TranscriptQualityAnalyzer:
|
|
"""Generic transcript quality comparison utilities"""
|
|
- Word-level difference analysis
|
|
- Confidence scoring algorithms
|
|
- Timing precision analysis
|
|
- Quality metrics calculation
|
|
|
|
# Decision: Move to class library (reusable across projects)
|
|
```
|
|
|
|
### 2. API Model Classes (Project-Specific)
|
|
|
|
#### TranscriptOptionsRequest
|
|
```python
|
|
# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
|
|
class TranscriptOptionsRequest(BaseModel):
|
|
source: Literal['youtube', 'whisper', 'both'] = 'youtube'
|
|
whisper_model: Literal['tiny', 'base', 'small', 'medium'] = 'small'
|
|
language: str = 'en'
|
|
include_timestamps: bool = True
|
|
quality_threshold: float = 0.7
|
|
|
|
# Decision: Project-specific (YouTube Summarizer API contract)
|
|
```
|
|
|
|
#### TranscriptComparisonResponse
|
|
```python
|
|
# Location: apps/youtube-summarizer/backend/models/transcript.py (extend existing)
|
|
class TranscriptComparisonResponse(BaseModel):
|
|
video_id: str
|
|
transcripts: Dict[str, TranscriptResult] # {'youtube': result, 'whisper': result}
|
|
quality_comparison: Dict[str, Any]
|
|
recommended_source: str
|
|
processing_time_comparison: Dict[str, float]
|
|
|
|
# Decision: Project-specific (specific to dual transcript feature)
|
|
```
|
|
|
|
### 3. Frontend Components (Project-Specific)
|
|
|
|
```typescript
|
|
// Location: frontend/src/components/TranscriptSelector.tsx
|
|
interface TranscriptSelectorProps {
|
|
value: TranscriptSource
|
|
onChange: (source: TranscriptSource) => void
|
|
estimatedDuration?: number
|
|
disabled?: boolean
|
|
}
|
|
|
|
// Location: frontend/src/components/TranscriptComparison.tsx
|
|
interface TranscriptComparisonProps {
|
|
youtubeTranscript: TranscriptResult
|
|
whisperTranscript: TranscriptResult
|
|
onSelectTranscript: (source: TranscriptSource) => void
|
|
}
|
|
|
|
// Location: frontend/src/types/transcript.ts
|
|
type TranscriptSource = 'youtube' | 'whisper' | 'both'
|
|
|
|
# Decision: All project-specific (YouTube Summarizer UI)
|
|
```
|
|
|
|
## Class Library Recommendations
|
|
|
|
### Move to Shared Library (`/lib/`)
|
|
|
|
#### 1. Base Transcript Interfaces
|
|
```python
|
|
# Location: lib/core/interfaces/transcript_interfaces.py
|
|
from abc import ABC, abstractmethod
|
|
from typing import List, Dict, Any, Optional
|
|
|
|
class ITranscriptService(ABC):
|
|
"""Base interface for transcript services"""
|
|
@abstractmethod
|
|
async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
|
|
pass
|
|
|
|
class IQualityAnalyzer(ABC):
|
|
"""Base interface for transcript quality analysis"""
|
|
@abstractmethod
|
|
def compare_transcripts(self, transcript1: str, transcript2: str) -> Dict[str, Any]:
|
|
pass
|
|
|
|
class ITranscriptCache(ABC):
|
|
"""Base interface for transcript caching"""
|
|
@abstractmethod
|
|
async def get_transcript(self, key: str) -> Optional[Dict[str, Any]]:
|
|
pass
|
|
```
|
|
|
|
#### 2. Quality Analysis Utilities
|
|
```python
|
|
# Location: lib/utils/helpers/transcript_analyzer.py
|
|
class TranscriptQualityAnalyzer:
|
|
"""Reusable transcript quality comparison utilities"""
|
|
|
|
def calculate_word_accuracy(self, reference: str, candidate: str) -> float:
|
|
"""Calculate word-level accuracy between transcripts"""
|
|
|
|
def analyze_timing_precision(self, segments1: List, segments2: List) -> Dict:
|
|
"""Compare timestamp accuracy between segment lists"""
|
|
|
|
def generate_diff_highlights(self, text1: str, text2: str) -> Dict:
|
|
"""Generate visual diff highlighting for UI display"""
|
|
|
|
def calculate_quality_score(self, transcript: str, segments: List) -> float:
|
|
"""Calculate overall quality score based on multiple metrics"""
|
|
```
|
|
|
|
#### 3. Base Whisper Service Interface
|
|
```python
|
|
# Location: lib/services/ai/base_whisper_service.py
|
|
from lib.core.interfaces.transcript_interfaces import ITranscriptService
|
|
|
|
class BaseWhisperService(ITranscriptService):
|
|
"""Base Whisper service implementation that can be extended"""
|
|
|
|
def __init__(self, model_size: str = "small", device: str = "auto"):
|
|
self.model_size = model_size
|
|
self.device = self._detect_device(device)
|
|
self.model = None
|
|
|
|
def _detect_device(self, preferred: str) -> str:
|
|
"""Generic device detection logic"""
|
|
|
|
def _load_model(self):
|
|
"""Generic model loading with caching"""
|
|
|
|
async def transcribe_audio(self, audio_path: Path) -> Dict[str, Any]:
|
|
"""Base transcription implementation"""
|
|
```
|
|
|
|
#### 4. Enhanced Segment Model
|
|
```python
|
|
# Location: lib/data/models/transcript_models.py
|
|
from lib.data.models.base_model import BaseModel
|
|
|
|
class EnhancedTranscriptSegment(BaseModel):
|
|
"""Enhanced segment model combining both architectures"""
|
|
text: str
|
|
start_time: float
|
|
end_time: float
|
|
duration: float # Calculated property
|
|
confidence: Optional[float] = None
|
|
speaker: Optional[str] = None
|
|
word_count: Optional[int] = None
|
|
|
|
@property
|
|
def duration(self) -> float:
|
|
return self.end_time - self.start_time
|
|
|
|
def to_basic_segment(self) -> 'TranscriptSegment':
|
|
"""Convert to project-specific basic segment"""
|
|
return TranscriptSegment(
|
|
text=self.text,
|
|
start=self.start_time,
|
|
duration=self.duration
|
|
)
|
|
```
|
|
|
|
### Keep Project-Specific
|
|
|
|
#### 1. YouTube-Specific Models
|
|
All current models in `apps/youtube-summarizer/backend/models/transcript.py` remain project-specific as they're tied to YouTube Summarizer's specific API contracts and domain.
|
|
|
|
#### 2. Service Implementations
|
|
- `WhisperTranscriptService` - YouTube video context
|
|
- `DualTranscriptService` - YouTube pipeline integration
|
|
- `EnhancedTranscriptService` - YouTube-specific orchestration
|
|
|
|
#### 3. API Endpoints and Controllers
|
|
All API-related classes stay project-specific.
|
|
|
|
#### 4. Frontend Components
|
|
All React components and TypeScript types stay project-specific.
|
|
|
|
## Implementation Strategy
|
|
|
|
### Phase 1: Create Class Library Base Components
|
|
```bash
|
|
# Create shared interfaces and utilities
|
|
mkdir -p lib/core/interfaces
|
|
mkdir -p lib/services/ai
|
|
mkdir -p lib/utils/helpers
|
|
mkdir -p lib/data/models
|
|
|
|
# Copy and adapt quality analysis utilities
|
|
# Create base interfaces for transcript services
|
|
# Create enhanced segment model
|
|
```
|
|
|
|
### Phase 2: Implement Project-Specific Classes
|
|
```bash
|
|
# Create WhisperTranscriptService (adapt from archived)
|
|
# Create DualTranscriptService (extend EnhancedTranscriptService)
|
|
# Extend existing transcript models with new API models
|
|
# Create frontend components
|
|
```
|
|
|
|
### Phase 3: Integration and Testing
|
|
```bash
|
|
# Update dependency injection to use new services
|
|
# Update existing services to use class library utilities
|
|
# Add comprehensive tests for both shared and project-specific code
|
|
```
|
|
|
|
## Benefits of This Approach
|
|
|
|
### Shared Class Library Benefits
|
|
1. **Reusability** - Quality analysis and base Whisper service can be used across projects
|
|
2. **Consistency** - Standard interfaces ensure consistent behavior
|
|
3. **Maintainability** - Core algorithms centralized for easier updates
|
|
4. **Testing** - Shared utilities can have comprehensive test coverage
|
|
|
|
### Project-Specific Benefits
|
|
1. **Flexibility** - YouTube-specific models can evolve independently
|
|
2. **Performance** - No unnecessary abstraction overhead
|
|
3. **Domain Clarity** - Clear separation between generic and YouTube-specific logic
|
|
4. **Deployment** - No shared library versioning concerns
|
|
|
|
## Migration Path
|
|
|
|
### Immediate (Story 4.1)
|
|
- Implement all classes as project-specific first
|
|
- Get dual transcript feature working end-to-end
|
|
- Validate architecture and patterns
|
|
|
|
### Future Refactoring (Post-Story 4.1)
|
|
- Extract proven quality analysis utilities to class library
|
|
- Move base Whisper interfaces to shared location
|
|
- Update other projects to use shared transcript utilities
|
|
|
|
## Conclusion
|
|
|
|
For Story 4.1 implementation, focus on **project-specific classes** to maintain velocity and ensure the feature works correctly. The class library extraction can be done as a follow-up refactoring once the implementation is proven and stable.
|
|
|
|
**Key Classes to Create:**
|
|
1. ✅ `WhisperTranscriptService` (project-specific)
|
|
2. ✅ `DualTranscriptService` (project-specific)
|
|
3. ✅ `TranscriptOptionsRequest/Response` (extend existing models)
|
|
4. ✅ `TranscriptSelector/Comparison` (React components)
|
|
|
|
**Future Class Library Candidates:**
|
|
1. 🔄 `TranscriptQualityAnalyzer` utilities
|
|
2. 🔄 `BaseWhisperService` interface
|
|
3. 🔄 `EnhancedTranscriptSegment` model
|
|
|
|
This approach balances immediate implementation needs with future reusability and maintainability. |