youtube-summarizer/docs/EPIC_4_DATABASE_ARCHITECTUR...

11 KiB

Epic 4 Database Architecture Solution

Problem Summary

The YouTube Summarizer is experiencing critical database and architecture issues preventing Epic 4 features from working:

  1. Table Definition Conflicts: rag_chunks and other tables being defined both in models and migrations
  2. Missing Foreign Keys: enhanced_exports.template_id references non-existent prompt_templates table
  3. Circular Dependencies: Models importing each other causing initialization loops
  4. Disabled Features: Multi-agent and analysis template routers disabled due to these issues
  5. Migration State Mismatch: Models expect tables that don't exist yet

Root Cause Analysis

Current Architecture Issues

  1. Model-First vs Migration-First Conflict

    • Models are using Model base class that auto-registers with DatabaseRegistry
    • Tables are being created by models before migrations run
    • Migrations try to create tables that already exist from model definitions
  2. Import Order Problems

    • models/__init__.py imports all models at once
    • Models reference foreign keys to tables not yet created
    • Circular imports between related models
  3. DatabaseRegistry Singleton Limitations

    • Registry prevents duplicate table definitions (good)
    • But doesn't handle migration/model synchronization (bad)
    • No deferred foreign key resolution

Permanent Architecture Solution

1. Database Migration Strategy

Phase 1: Clean Migration Path

# Step 1: Apply all pending migrations in correct order
cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
source ../venv/bin/activate

# Check current migration status
PYTHONPATH=. ../venv/bin/python3 -m alembic current

# Apply Epic 4 migrations
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade add_epic_4_features

Phase 2: Create Comprehensive Epic 4 Migration

# backend/alembic/versions/epic_4_complete_integration.py
"""Complete Epic 4 integration with all features

Revision ID: epic_4_complete
Revises: add_epic_4_features
"""

def upgrade():
    # Ensure all Epic 4 tables exist
    
    # 1. Multi-Agent Analysis Tables (Story 4.3)
    if not table_exists('agent_summaries'):
        op.create_table('agent_summaries', ...)
    
    # 2. Custom Prompt Templates (Story 4.4)
    if not table_exists('prompt_templates'):
        op.create_table('prompt_templates', ...)
    
    # 3. Enhanced Export Metadata (Story 4.4)
    if not table_exists('export_metadata'):
        op.create_table('export_metadata', ...)
    
    # 4. Summary Sections (Story 4.4)
    if not table_exists('summary_sections'):
        op.create_table('summary_sections', ...)
    
    # 5. RAG Tables (Story 4.6)
    if not table_exists('rag_chunks'):
        op.create_table('rag_chunks', ...)
    if not table_exists('vector_embeddings'):
        op.create_table('vector_embeddings', ...)

2. Model Architecture Refactoring

Lazy Model Loading Pattern

# backend/models/lazy_models.py
"""Lazy loading wrapper for all Epic 4 models"""

from typing import TYPE_CHECKING, Optional
from sqlalchemy.orm import relationship

if TYPE_CHECKING:
    from .prompt_templates import PromptTemplate
    from .agent_summaries import AgentSummary
    from .rag_models import RAGChunk

class LazyModelMixin:
    """Mixin for lazy relationship loading"""
    
    @property
    def prompt_template(self) -> Optional['PromptTemplate']:
        """Lazy load prompt template relationship"""
        if hasattr(self, '_prompt_template'):
            return self._prompt_template
        return None

Proper Model Inheritance

# backend/models/base.py
from backend.core.database_registry import registry
from sqlalchemy.ext.declarative import declared_attr

class TimestampedModel:
    """Mixin for created_at/updated_at fields"""
    
    @declared_attr
    def created_at(cls):
        return Column(DateTime, default=func.now())
    
    @declared_attr
    def updated_at(cls):
        return Column(DateTime, onupdate=func.now())

class Model(registry.Base, TimestampedModel):
    """Base model with registry integration"""
    __abstract__ = True
    
    # Prevent duplicate registration
    __table_args__ = {'extend_existing': True}

3. Epic 4 Unified Model Registry

Create Central Epic 4 Models

# backend/models/epic4/__init__.py
"""Epic 4 model package with proper initialization order"""

# Import order matters - base tables first, then dependent tables

# 1. Base tables (no foreign keys to Epic 4 tables)
from .prompt_templates import PromptTemplate
from .agent_summaries import AgentSummary

# 2. Dependent tables (have foreign keys to above)
from .enhanced_exports import EnhancedExport
from .export_sections import ExportSection
from .prompt_experiments import PromptExperiment

# 3. RAG tables (can reference any above)
from .rag_chunks import RAGChunk
from .vector_embeddings import VectorEmbedding
from .semantic_search import SemanticSearchResult

# 4. Multi-agent tables
from .multi_agent_analysis import MultiAgentAnalysis
from .playlist_analysis import PlaylistAnalysis

__all__ = [
    'PromptTemplate',
    'AgentSummary',
    'EnhancedExport',
    'ExportSection',
    'PromptExperiment',
    'RAGChunk',
    'VectorEmbedding',
    'SemanticSearchResult',
    'MultiAgentAnalysis',
    'PlaylistAnalysis',
]

4. Fix Individual Model Issues

RAG Models Fix

# backend/models/epic4/rag_chunks.py
from sqlalchemy import Column, String, Integer, Text, Float, ForeignKey
from backend.models.base import Model, GUID

class RAGChunk(Model):
    """Text chunks for RAG processing"""
    __tablename__ = "rag_chunks"
    __table_args__ = {'extend_existing': True}  # Prevent duplicate definition
    
    id = Column(GUID, primary_key=True, default=uuid.uuid4)
    summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'), nullable=True)
    video_id = Column(String(20), nullable=False, index=True)
    
    # Use string references for relationships to avoid circular imports
    summary = relationship("Summary", back_populates="rag_chunks", lazy='select')

Agent Summary Model

# backend/models/epic4/agent_summaries.py
from backend.models.base import Model, GUID

class AgentSummary(Model):
    """Multi-agent analysis results"""
    __tablename__ = "agent_summaries"
    __table_args__ = {'extend_existing': True}
    
    id = Column(GUID, primary_key=True, default=uuid.uuid4)
    summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'))
    agent_type = Column(String(20), nullable=False)  # technical, business, user, synthesis
    
    # JSON fields for flexible schema
    analysis_result = Column(JSON, nullable=False)
    
    # Relationships
    summary = relationship("Summary", back_populates="agent_analyses")

5. Multi-Agent Integration with Database

Update Multi-Agent Orchestrator

# backend/services/multi_agent_orchestrator.py
from backend.models.epic4 import AgentSummary
from backend.core.database import get_db

class MultiAgentVideoOrchestrator:
    """Enhanced orchestrator with database persistence"""
    
    async def save_analysis_to_database(
        self, 
        summary_id: str,
        analysis_result: Dict[str, Any],
        db: Session
    ) -> List[AgentSummary]:
        """Save multi-agent analysis to database"""
        
        agent_summaries = []
        
        for perspective_type, analysis in analysis_result['perspectives'].items():
            agent_summary = AgentSummary(
                summary_id=summary_id,
                agent_type=perspective_type,
                analysis_result=analysis
            )
            db.add(agent_summary)
            agent_summaries.append(agent_summary)
        
        db.commit()
        return agent_summaries

6. API Router Re-enablement

Update Main Application

# backend/main.py

# Import Epic 4 models in correct order
from backend.models.epic4 import (
    PromptTemplate, AgentSummary,
    EnhancedExport, ExportSection,
    RAGChunk, VectorEmbedding
)

# Re-enable routers
from backend.api.multi_agent import router as multi_agent_router
from backend.api.enhanced_export import router as enhanced_export_router
from backend.api.prompt_templates import router as templates_router

# Include all routers
app.include_router(multi_agent_router)
app.include_router(enhanced_export_router) 
app.include_router(templates_router)

7. Implementation Steps

Step 1: Database Reset and Migration

# Backup current database
cp data/app.db data/app.db.backup

# Reset migrations to clean state
PYTHONPATH=. ../venv/bin/python3 -m alembic downgrade base
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade head

Step 2: Model Refactoring

  1. Create backend/models/epic4/ directory
  2. Move all Epic 4 models to new directory
  3. Add __table_args__ = {'extend_existing': True} to all models
  4. Update imports to use new structure

Step 3: Update Service Layer

  1. Update multi-agent orchestrator to save to database
  2. Add database persistence to playlist analyzer
  3. Create enhanced export service with database integration

Step 4: Re-enable and Test

  1. Re-enable disabled routers in main.py
  2. Run comprehensive tests
  3. Verify all Epic 4 features work together

Testing Strategy

Integration Tests

# tests/integration/test_epic4_integration.py

async def test_multi_agent_with_database():
    """Test multi-agent analysis saves to database"""
    # Create summary
    # Run multi-agent analysis
    # Verify agent_summaries table populated
    
async def test_enhanced_export_with_templates():
    """Test enhanced export uses prompt templates"""
    # Create prompt template
    # Generate enhanced export
    # Verify export uses template

async def test_rag_chat_with_chunks():
    """Test RAG chat creates and uses chunks"""
    # Create summary
    # Generate RAG chunks
    # Test chat interface

Benefits of This Architecture

  1. Clean Separation: Models, migrations, and services are properly separated
  2. No Circular Dependencies: Lazy loading and string references prevent cycles
  3. Database Integrity: Foreign keys properly enforced with cascading deletes
  4. Extensibility: Easy to add new Epic 4 features without breaking existing ones
  5. Performance: Optimized indexes and relationships for fast queries
  6. Maintainability: Clear structure makes debugging and updates easier

Rollback Plan

If issues occur:

  1. Restore database backup: cp data/app.db.backup data/app.db
  2. Revert code changes: git checkout -- backend/models
  3. Disable Epic 4 routers temporarily
  4. Debug specific issues before re-attempting

Success Criteria

All migrations apply without errors No "table already exists" errors Multi-agent analysis saves to database Enhanced exports work with templates RAG chat functions with vector embeddings All Epic 4 API endpoints return 200 status No circular import errors Frontend can access all Epic 4 features

Timeline

  • Hour 1: Database migration and reset
  • Hour 2: Model refactoring and epic4 package creation
  • Hour 3: Service layer updates
  • Hour 4: API router re-enablement and testing
  • Hour 5: Integration testing and bug fixes
  • Hour 6: Documentation and deployment

This comprehensive solution addresses all database issues while maintaining the benefits of the DatabaseRegistry pattern and enabling all Epic 4 features to work together seamlessly.