11 KiB
11 KiB
Epic 4 Database Architecture Solution
Problem Summary
The YouTube Summarizer is experiencing critical database and architecture issues preventing Epic 4 features from working:
- Table Definition Conflicts:
rag_chunksand other tables being defined both in models and migrations - Missing Foreign Keys:
enhanced_exports.template_idreferences non-existentprompt_templatestable - Circular Dependencies: Models importing each other causing initialization loops
- Disabled Features: Multi-agent and analysis template routers disabled due to these issues
- Migration State Mismatch: Models expect tables that don't exist yet
Root Cause Analysis
Current Architecture Issues
-
Model-First vs Migration-First Conflict
- Models are using
Modelbase class that auto-registers with DatabaseRegistry - Tables are being created by models before migrations run
- Migrations try to create tables that already exist from model definitions
- Models are using
-
Import Order Problems
models/__init__.pyimports all models at once- Models reference foreign keys to tables not yet created
- Circular imports between related models
-
DatabaseRegistry Singleton Limitations
- Registry prevents duplicate table definitions (good)
- But doesn't handle migration/model synchronization (bad)
- No deferred foreign key resolution
Permanent Architecture Solution
1. Database Migration Strategy
Phase 1: Clean Migration Path
# Step 1: Apply all pending migrations in correct order
cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
source ../venv/bin/activate
# Check current migration status
PYTHONPATH=. ../venv/bin/python3 -m alembic current
# Apply Epic 4 migrations
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade add_epic_4_features
Phase 2: Create Comprehensive Epic 4 Migration
# backend/alembic/versions/epic_4_complete_integration.py
"""Complete Epic 4 integration with all features
Revision ID: epic_4_complete
Revises: add_epic_4_features
"""
def upgrade():
# Ensure all Epic 4 tables exist
# 1. Multi-Agent Analysis Tables (Story 4.3)
if not table_exists('agent_summaries'):
op.create_table('agent_summaries', ...)
# 2. Custom Prompt Templates (Story 4.4)
if not table_exists('prompt_templates'):
op.create_table('prompt_templates', ...)
# 3. Enhanced Export Metadata (Story 4.4)
if not table_exists('export_metadata'):
op.create_table('export_metadata', ...)
# 4. Summary Sections (Story 4.4)
if not table_exists('summary_sections'):
op.create_table('summary_sections', ...)
# 5. RAG Tables (Story 4.6)
if not table_exists('rag_chunks'):
op.create_table('rag_chunks', ...)
if not table_exists('vector_embeddings'):
op.create_table('vector_embeddings', ...)
2. Model Architecture Refactoring
Lazy Model Loading Pattern
# backend/models/lazy_models.py
"""Lazy loading wrapper for all Epic 4 models"""
from typing import TYPE_CHECKING, Optional
from sqlalchemy.orm import relationship
if TYPE_CHECKING:
from .prompt_templates import PromptTemplate
from .agent_summaries import AgentSummary
from .rag_models import RAGChunk
class LazyModelMixin:
"""Mixin for lazy relationship loading"""
@property
def prompt_template(self) -> Optional['PromptTemplate']:
"""Lazy load prompt template relationship"""
if hasattr(self, '_prompt_template'):
return self._prompt_template
return None
Proper Model Inheritance
# backend/models/base.py
from backend.core.database_registry import registry
from sqlalchemy.ext.declarative import declared_attr
class TimestampedModel:
"""Mixin for created_at/updated_at fields"""
@declared_attr
def created_at(cls):
return Column(DateTime, default=func.now())
@declared_attr
def updated_at(cls):
return Column(DateTime, onupdate=func.now())
class Model(registry.Base, TimestampedModel):
"""Base model with registry integration"""
__abstract__ = True
# Prevent duplicate registration
__table_args__ = {'extend_existing': True}
3. Epic 4 Unified Model Registry
Create Central Epic 4 Models
# backend/models/epic4/__init__.py
"""Epic 4 model package with proper initialization order"""
# Import order matters - base tables first, then dependent tables
# 1. Base tables (no foreign keys to Epic 4 tables)
from .prompt_templates import PromptTemplate
from .agent_summaries import AgentSummary
# 2. Dependent tables (have foreign keys to above)
from .enhanced_exports import EnhancedExport
from .export_sections import ExportSection
from .prompt_experiments import PromptExperiment
# 3. RAG tables (can reference any above)
from .rag_chunks import RAGChunk
from .vector_embeddings import VectorEmbedding
from .semantic_search import SemanticSearchResult
# 4. Multi-agent tables
from .multi_agent_analysis import MultiAgentAnalysis
from .playlist_analysis import PlaylistAnalysis
__all__ = [
'PromptTemplate',
'AgentSummary',
'EnhancedExport',
'ExportSection',
'PromptExperiment',
'RAGChunk',
'VectorEmbedding',
'SemanticSearchResult',
'MultiAgentAnalysis',
'PlaylistAnalysis',
]
4. Fix Individual Model Issues
RAG Models Fix
# backend/models/epic4/rag_chunks.py
from sqlalchemy import Column, String, Integer, Text, Float, ForeignKey
from backend.models.base import Model, GUID
class RAGChunk(Model):
"""Text chunks for RAG processing"""
__tablename__ = "rag_chunks"
__table_args__ = {'extend_existing': True} # Prevent duplicate definition
id = Column(GUID, primary_key=True, default=uuid.uuid4)
summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'), nullable=True)
video_id = Column(String(20), nullable=False, index=True)
# Use string references for relationships to avoid circular imports
summary = relationship("Summary", back_populates="rag_chunks", lazy='select')
Agent Summary Model
# backend/models/epic4/agent_summaries.py
from backend.models.base import Model, GUID
class AgentSummary(Model):
"""Multi-agent analysis results"""
__tablename__ = "agent_summaries"
__table_args__ = {'extend_existing': True}
id = Column(GUID, primary_key=True, default=uuid.uuid4)
summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'))
agent_type = Column(String(20), nullable=False) # technical, business, user, synthesis
# JSON fields for flexible schema
analysis_result = Column(JSON, nullable=False)
# Relationships
summary = relationship("Summary", back_populates="agent_analyses")
5. Multi-Agent Integration with Database
Update Multi-Agent Orchestrator
# backend/services/multi_agent_orchestrator.py
from backend.models.epic4 import AgentSummary
from backend.core.database import get_db
class MultiAgentVideoOrchestrator:
"""Enhanced orchestrator with database persistence"""
async def save_analysis_to_database(
self,
summary_id: str,
analysis_result: Dict[str, Any],
db: Session
) -> List[AgentSummary]:
"""Save multi-agent analysis to database"""
agent_summaries = []
for perspective_type, analysis in analysis_result['perspectives'].items():
agent_summary = AgentSummary(
summary_id=summary_id,
agent_type=perspective_type,
analysis_result=analysis
)
db.add(agent_summary)
agent_summaries.append(agent_summary)
db.commit()
return agent_summaries
6. API Router Re-enablement
Update Main Application
# backend/main.py
# Import Epic 4 models in correct order
from backend.models.epic4 import (
PromptTemplate, AgentSummary,
EnhancedExport, ExportSection,
RAGChunk, VectorEmbedding
)
# Re-enable routers
from backend.api.multi_agent import router as multi_agent_router
from backend.api.enhanced_export import router as enhanced_export_router
from backend.api.prompt_templates import router as templates_router
# Include all routers
app.include_router(multi_agent_router)
app.include_router(enhanced_export_router)
app.include_router(templates_router)
7. Implementation Steps
Step 1: Database Reset and Migration
# Backup current database
cp data/app.db data/app.db.backup
# Reset migrations to clean state
PYTHONPATH=. ../venv/bin/python3 -m alembic downgrade base
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade head
Step 2: Model Refactoring
- Create
backend/models/epic4/directory - Move all Epic 4 models to new directory
- Add
__table_args__ = {'extend_existing': True}to all models - Update imports to use new structure
Step 3: Update Service Layer
- Update multi-agent orchestrator to save to database
- Add database persistence to playlist analyzer
- Create enhanced export service with database integration
Step 4: Re-enable and Test
- Re-enable disabled routers in main.py
- Run comprehensive tests
- Verify all Epic 4 features work together
Testing Strategy
Integration Tests
# tests/integration/test_epic4_integration.py
async def test_multi_agent_with_database():
"""Test multi-agent analysis saves to database"""
# Create summary
# Run multi-agent analysis
# Verify agent_summaries table populated
async def test_enhanced_export_with_templates():
"""Test enhanced export uses prompt templates"""
# Create prompt template
# Generate enhanced export
# Verify export uses template
async def test_rag_chat_with_chunks():
"""Test RAG chat creates and uses chunks"""
# Create summary
# Generate RAG chunks
# Test chat interface
Benefits of This Architecture
- Clean Separation: Models, migrations, and services are properly separated
- No Circular Dependencies: Lazy loading and string references prevent cycles
- Database Integrity: Foreign keys properly enforced with cascading deletes
- Extensibility: Easy to add new Epic 4 features without breaking existing ones
- Performance: Optimized indexes and relationships for fast queries
- Maintainability: Clear structure makes debugging and updates easier
Rollback Plan
If issues occur:
- Restore database backup:
cp data/app.db.backup data/app.db - Revert code changes:
git checkout -- backend/models - Disable Epic 4 routers temporarily
- Debug specific issues before re-attempting
Success Criteria
✅ All migrations apply without errors ✅ No "table already exists" errors ✅ Multi-agent analysis saves to database ✅ Enhanced exports work with templates ✅ RAG chat functions with vector embeddings ✅ All Epic 4 API endpoints return 200 status ✅ No circular import errors ✅ Frontend can access all Epic 4 features
Timeline
- Hour 1: Database migration and reset
- Hour 2: Model refactoring and epic4 package creation
- Hour 3: Service layer updates
- Hour 4: API router re-enablement and testing
- Hour 5: Integration testing and bug fixes
- Hour 6: Documentation and deployment
This comprehensive solution addresses all database issues while maintaining the benefits of the DatabaseRegistry pattern and enabling all Epic 4 features to work together seamlessly.