youtube-summarizer/docs/EPIC_4_DATABASE_ARCHITECTUR...

366 lines
11 KiB
Markdown

# Epic 4 Database Architecture Solution
## Problem Summary
The YouTube Summarizer is experiencing critical database and architecture issues preventing Epic 4 features from working:
1. **Table Definition Conflicts**: `rag_chunks` and other tables being defined both in models and migrations
2. **Missing Foreign Keys**: `enhanced_exports.template_id` references non-existent `prompt_templates` table
3. **Circular Dependencies**: Models importing each other causing initialization loops
4. **Disabled Features**: Multi-agent and analysis template routers disabled due to these issues
5. **Migration State Mismatch**: Models expect tables that don't exist yet
## Root Cause Analysis
### Current Architecture Issues
1. **Model-First vs Migration-First Conflict**
- Models are using `Model` base class that auto-registers with DatabaseRegistry
- Tables are being created by models before migrations run
- Migrations try to create tables that already exist from model definitions
2. **Import Order Problems**
- `models/__init__.py` imports all models at once
- Models reference foreign keys to tables not yet created
- Circular imports between related models
3. **DatabaseRegistry Singleton Limitations**
- Registry prevents duplicate table definitions (good)
- But doesn't handle migration/model synchronization (bad)
- No deferred foreign key resolution
## Permanent Architecture Solution
### 1. Database Migration Strategy
#### Phase 1: Clean Migration Path
```bash
# Step 1: Apply all pending migrations in correct order
cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
source ../venv/bin/activate
# Check current migration status
PYTHONPATH=. ../venv/bin/python3 -m alembic current
# Apply Epic 4 migrations
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade add_epic_4_features
```
#### Phase 2: Create Comprehensive Epic 4 Migration
```python
# backend/alembic/versions/epic_4_complete_integration.py
"""Complete Epic 4 integration with all features
Revision ID: epic_4_complete
Revises: add_epic_4_features
"""
def upgrade():
# Ensure all Epic 4 tables exist
# 1. Multi-Agent Analysis Tables (Story 4.3)
if not table_exists('agent_summaries'):
op.create_table('agent_summaries', ...)
# 2. Custom Prompt Templates (Story 4.4)
if not table_exists('prompt_templates'):
op.create_table('prompt_templates', ...)
# 3. Enhanced Export Metadata (Story 4.4)
if not table_exists('export_metadata'):
op.create_table('export_metadata', ...)
# 4. Summary Sections (Story 4.4)
if not table_exists('summary_sections'):
op.create_table('summary_sections', ...)
# 5. RAG Tables (Story 4.6)
if not table_exists('rag_chunks'):
op.create_table('rag_chunks', ...)
if not table_exists('vector_embeddings'):
op.create_table('vector_embeddings', ...)
```
### 2. Model Architecture Refactoring
#### Lazy Model Loading Pattern
```python
# backend/models/lazy_models.py
"""Lazy loading wrapper for all Epic 4 models"""
from typing import TYPE_CHECKING, Optional
from sqlalchemy.orm import relationship
if TYPE_CHECKING:
from .prompt_templates import PromptTemplate
from .agent_summaries import AgentSummary
from .rag_models import RAGChunk
class LazyModelMixin:
"""Mixin for lazy relationship loading"""
@property
def prompt_template(self) -> Optional['PromptTemplate']:
"""Lazy load prompt template relationship"""
if hasattr(self, '_prompt_template'):
return self._prompt_template
return None
```
#### Proper Model Inheritance
```python
# backend/models/base.py
from backend.core.database_registry import registry
from sqlalchemy.ext.declarative import declared_attr
class TimestampedModel:
"""Mixin for created_at/updated_at fields"""
@declared_attr
def created_at(cls):
return Column(DateTime, default=func.now())
@declared_attr
def updated_at(cls):
return Column(DateTime, onupdate=func.now())
class Model(registry.Base, TimestampedModel):
"""Base model with registry integration"""
__abstract__ = True
# Prevent duplicate registration
__table_args__ = {'extend_existing': True}
```
### 3. Epic 4 Unified Model Registry
#### Create Central Epic 4 Models
```python
# backend/models/epic4/__init__.py
"""Epic 4 model package with proper initialization order"""
# Import order matters - base tables first, then dependent tables
# 1. Base tables (no foreign keys to Epic 4 tables)
from .prompt_templates import PromptTemplate
from .agent_summaries import AgentSummary
# 2. Dependent tables (have foreign keys to above)
from .enhanced_exports import EnhancedExport
from .export_sections import ExportSection
from .prompt_experiments import PromptExperiment
# 3. RAG tables (can reference any above)
from .rag_chunks import RAGChunk
from .vector_embeddings import VectorEmbedding
from .semantic_search import SemanticSearchResult
# 4. Multi-agent tables
from .multi_agent_analysis import MultiAgentAnalysis
from .playlist_analysis import PlaylistAnalysis
__all__ = [
'PromptTemplate',
'AgentSummary',
'EnhancedExport',
'ExportSection',
'PromptExperiment',
'RAGChunk',
'VectorEmbedding',
'SemanticSearchResult',
'MultiAgentAnalysis',
'PlaylistAnalysis',
]
```
### 4. Fix Individual Model Issues
#### RAG Models Fix
```python
# backend/models/epic4/rag_chunks.py
from sqlalchemy import Column, String, Integer, Text, Float, ForeignKey
from backend.models.base import Model, GUID
class RAGChunk(Model):
"""Text chunks for RAG processing"""
__tablename__ = "rag_chunks"
__table_args__ = {'extend_existing': True} # Prevent duplicate definition
id = Column(GUID, primary_key=True, default=uuid.uuid4)
summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'), nullable=True)
video_id = Column(String(20), nullable=False, index=True)
# Use string references for relationships to avoid circular imports
summary = relationship("Summary", back_populates="rag_chunks", lazy='select')
```
#### Agent Summary Model
```python
# backend/models/epic4/agent_summaries.py
from backend.models.base import Model, GUID
class AgentSummary(Model):
"""Multi-agent analysis results"""
__tablename__ = "agent_summaries"
__table_args__ = {'extend_existing': True}
id = Column(GUID, primary_key=True, default=uuid.uuid4)
summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'))
agent_type = Column(String(20), nullable=False) # technical, business, user, synthesis
# JSON fields for flexible schema
analysis_result = Column(JSON, nullable=False)
# Relationships
summary = relationship("Summary", back_populates="agent_analyses")
```
### 5. Multi-Agent Integration with Database
#### Update Multi-Agent Orchestrator
```python
# backend/services/multi_agent_orchestrator.py
from backend.models.epic4 import AgentSummary
from backend.core.database import get_db
class MultiAgentVideoOrchestrator:
"""Enhanced orchestrator with database persistence"""
async def save_analysis_to_database(
self,
summary_id: str,
analysis_result: Dict[str, Any],
db: Session
) -> List[AgentSummary]:
"""Save multi-agent analysis to database"""
agent_summaries = []
for perspective_type, analysis in analysis_result['perspectives'].items():
agent_summary = AgentSummary(
summary_id=summary_id,
agent_type=perspective_type,
analysis_result=analysis
)
db.add(agent_summary)
agent_summaries.append(agent_summary)
db.commit()
return agent_summaries
```
### 6. API Router Re-enablement
#### Update Main Application
```python
# backend/main.py
# Import Epic 4 models in correct order
from backend.models.epic4 import (
PromptTemplate, AgentSummary,
EnhancedExport, ExportSection,
RAGChunk, VectorEmbedding
)
# Re-enable routers
from backend.api.multi_agent import router as multi_agent_router
from backend.api.enhanced_export import router as enhanced_export_router
from backend.api.prompt_templates import router as templates_router
# Include all routers
app.include_router(multi_agent_router)
app.include_router(enhanced_export_router)
app.include_router(templates_router)
```
### 7. Implementation Steps
#### Step 1: Database Reset and Migration
```bash
# Backup current database
cp data/app.db data/app.db.backup
# Reset migrations to clean state
PYTHONPATH=. ../venv/bin/python3 -m alembic downgrade base
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade head
```
#### Step 2: Model Refactoring
1. Create `backend/models/epic4/` directory
2. Move all Epic 4 models to new directory
3. Add `__table_args__ = {'extend_existing': True}` to all models
4. Update imports to use new structure
#### Step 3: Update Service Layer
1. Update multi-agent orchestrator to save to database
2. Add database persistence to playlist analyzer
3. Create enhanced export service with database integration
#### Step 4: Re-enable and Test
1. Re-enable disabled routers in main.py
2. Run comprehensive tests
3. Verify all Epic 4 features work together
## Testing Strategy
### Integration Tests
```python
# tests/integration/test_epic4_integration.py
async def test_multi_agent_with_database():
"""Test multi-agent analysis saves to database"""
# Create summary
# Run multi-agent analysis
# Verify agent_summaries table populated
async def test_enhanced_export_with_templates():
"""Test enhanced export uses prompt templates"""
# Create prompt template
# Generate enhanced export
# Verify export uses template
async def test_rag_chat_with_chunks():
"""Test RAG chat creates and uses chunks"""
# Create summary
# Generate RAG chunks
# Test chat interface
```
## Benefits of This Architecture
1. **Clean Separation**: Models, migrations, and services are properly separated
2. **No Circular Dependencies**: Lazy loading and string references prevent cycles
3. **Database Integrity**: Foreign keys properly enforced with cascading deletes
4. **Extensibility**: Easy to add new Epic 4 features without breaking existing ones
5. **Performance**: Optimized indexes and relationships for fast queries
6. **Maintainability**: Clear structure makes debugging and updates easier
## Rollback Plan
If issues occur:
1. Restore database backup: `cp data/app.db.backup data/app.db`
2. Revert code changes: `git checkout -- backend/models`
3. Disable Epic 4 routers temporarily
4. Debug specific issues before re-attempting
## Success Criteria
✅ All migrations apply without errors
✅ No "table already exists" errors
✅ Multi-agent analysis saves to database
✅ Enhanced exports work with templates
✅ RAG chat functions with vector embeddings
✅ All Epic 4 API endpoints return 200 status
✅ No circular import errors
✅ Frontend can access all Epic 4 features
## Timeline
- **Hour 1**: Database migration and reset
- **Hour 2**: Model refactoring and epic4 package creation
- **Hour 3**: Service layer updates
- **Hour 4**: API router re-enablement and testing
- **Hour 5**: Integration testing and bug fixes
- **Hour 6**: Documentation and deployment
This comprehensive solution addresses all database issues while maintaining the benefits of the DatabaseRegistry pattern and enabling all Epic 4 features to work together seamlessly.