366 lines
11 KiB
Markdown
366 lines
11 KiB
Markdown
# Epic 4 Database Architecture Solution
|
|
|
|
## Problem Summary
|
|
|
|
The YouTube Summarizer is experiencing critical database and architecture issues preventing Epic 4 features from working:
|
|
|
|
1. **Table Definition Conflicts**: `rag_chunks` and other tables being defined both in models and migrations
|
|
2. **Missing Foreign Keys**: `enhanced_exports.template_id` references non-existent `prompt_templates` table
|
|
3. **Circular Dependencies**: Models importing each other causing initialization loops
|
|
4. **Disabled Features**: Multi-agent and analysis template routers disabled due to these issues
|
|
5. **Migration State Mismatch**: Models expect tables that don't exist yet
|
|
|
|
## Root Cause Analysis
|
|
|
|
### Current Architecture Issues
|
|
|
|
1. **Model-First vs Migration-First Conflict**
|
|
- Models are using `Model` base class that auto-registers with DatabaseRegistry
|
|
- Tables are being created by models before migrations run
|
|
- Migrations try to create tables that already exist from model definitions
|
|
|
|
2. **Import Order Problems**
|
|
- `models/__init__.py` imports all models at once
|
|
- Models reference foreign keys to tables not yet created
|
|
- Circular imports between related models
|
|
|
|
3. **DatabaseRegistry Singleton Limitations**
|
|
- Registry prevents duplicate table definitions (good)
|
|
- But doesn't handle migration/model synchronization (bad)
|
|
- No deferred foreign key resolution
|
|
|
|
## Permanent Architecture Solution
|
|
|
|
### 1. Database Migration Strategy
|
|
|
|
#### Phase 1: Clean Migration Path
|
|
```bash
|
|
# Step 1: Apply all pending migrations in correct order
|
|
cd /Users/enias/projects/my-ai-projects/apps/youtube-summarizer
|
|
source ../venv/bin/activate
|
|
|
|
# Check current migration status
|
|
PYTHONPATH=. ../venv/bin/python3 -m alembic current
|
|
|
|
# Apply Epic 4 migrations
|
|
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade add_epic_4_features
|
|
```
|
|
|
|
#### Phase 2: Create Comprehensive Epic 4 Migration
|
|
```python
|
|
# backend/alembic/versions/epic_4_complete_integration.py
|
|
"""Complete Epic 4 integration with all features
|
|
|
|
Revision ID: epic_4_complete
|
|
Revises: add_epic_4_features
|
|
"""
|
|
|
|
def upgrade():
|
|
# Ensure all Epic 4 tables exist
|
|
|
|
# 1. Multi-Agent Analysis Tables (Story 4.3)
|
|
if not table_exists('agent_summaries'):
|
|
op.create_table('agent_summaries', ...)
|
|
|
|
# 2. Custom Prompt Templates (Story 4.4)
|
|
if not table_exists('prompt_templates'):
|
|
op.create_table('prompt_templates', ...)
|
|
|
|
# 3. Enhanced Export Metadata (Story 4.4)
|
|
if not table_exists('export_metadata'):
|
|
op.create_table('export_metadata', ...)
|
|
|
|
# 4. Summary Sections (Story 4.4)
|
|
if not table_exists('summary_sections'):
|
|
op.create_table('summary_sections', ...)
|
|
|
|
# 5. RAG Tables (Story 4.6)
|
|
if not table_exists('rag_chunks'):
|
|
op.create_table('rag_chunks', ...)
|
|
if not table_exists('vector_embeddings'):
|
|
op.create_table('vector_embeddings', ...)
|
|
```
|
|
|
|
### 2. Model Architecture Refactoring
|
|
|
|
#### Lazy Model Loading Pattern
|
|
```python
|
|
# backend/models/lazy_models.py
|
|
"""Lazy loading wrapper for all Epic 4 models"""
|
|
|
|
from typing import TYPE_CHECKING, Optional
|
|
from sqlalchemy.orm import relationship
|
|
|
|
if TYPE_CHECKING:
|
|
from .prompt_templates import PromptTemplate
|
|
from .agent_summaries import AgentSummary
|
|
from .rag_models import RAGChunk
|
|
|
|
class LazyModelMixin:
|
|
"""Mixin for lazy relationship loading"""
|
|
|
|
@property
|
|
def prompt_template(self) -> Optional['PromptTemplate']:
|
|
"""Lazy load prompt template relationship"""
|
|
if hasattr(self, '_prompt_template'):
|
|
return self._prompt_template
|
|
return None
|
|
```
|
|
|
|
#### Proper Model Inheritance
|
|
```python
|
|
# backend/models/base.py
|
|
from backend.core.database_registry import registry
|
|
from sqlalchemy.ext.declarative import declared_attr
|
|
|
|
class TimestampedModel:
|
|
"""Mixin for created_at/updated_at fields"""
|
|
|
|
@declared_attr
|
|
def created_at(cls):
|
|
return Column(DateTime, default=func.now())
|
|
|
|
@declared_attr
|
|
def updated_at(cls):
|
|
return Column(DateTime, onupdate=func.now())
|
|
|
|
class Model(registry.Base, TimestampedModel):
|
|
"""Base model with registry integration"""
|
|
__abstract__ = True
|
|
|
|
# Prevent duplicate registration
|
|
__table_args__ = {'extend_existing': True}
|
|
```
|
|
|
|
### 3. Epic 4 Unified Model Registry
|
|
|
|
#### Create Central Epic 4 Models
|
|
```python
|
|
# backend/models/epic4/__init__.py
|
|
"""Epic 4 model package with proper initialization order"""
|
|
|
|
# Import order matters - base tables first, then dependent tables
|
|
|
|
# 1. Base tables (no foreign keys to Epic 4 tables)
|
|
from .prompt_templates import PromptTemplate
|
|
from .agent_summaries import AgentSummary
|
|
|
|
# 2. Dependent tables (have foreign keys to above)
|
|
from .enhanced_exports import EnhancedExport
|
|
from .export_sections import ExportSection
|
|
from .prompt_experiments import PromptExperiment
|
|
|
|
# 3. RAG tables (can reference any above)
|
|
from .rag_chunks import RAGChunk
|
|
from .vector_embeddings import VectorEmbedding
|
|
from .semantic_search import SemanticSearchResult
|
|
|
|
# 4. Multi-agent tables
|
|
from .multi_agent_analysis import MultiAgentAnalysis
|
|
from .playlist_analysis import PlaylistAnalysis
|
|
|
|
__all__ = [
|
|
'PromptTemplate',
|
|
'AgentSummary',
|
|
'EnhancedExport',
|
|
'ExportSection',
|
|
'PromptExperiment',
|
|
'RAGChunk',
|
|
'VectorEmbedding',
|
|
'SemanticSearchResult',
|
|
'MultiAgentAnalysis',
|
|
'PlaylistAnalysis',
|
|
]
|
|
```
|
|
|
|
### 4. Fix Individual Model Issues
|
|
|
|
#### RAG Models Fix
|
|
```python
|
|
# backend/models/epic4/rag_chunks.py
|
|
from sqlalchemy import Column, String, Integer, Text, Float, ForeignKey
|
|
from backend.models.base import Model, GUID
|
|
|
|
class RAGChunk(Model):
|
|
"""Text chunks for RAG processing"""
|
|
__tablename__ = "rag_chunks"
|
|
__table_args__ = {'extend_existing': True} # Prevent duplicate definition
|
|
|
|
id = Column(GUID, primary_key=True, default=uuid.uuid4)
|
|
summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'), nullable=True)
|
|
video_id = Column(String(20), nullable=False, index=True)
|
|
|
|
# Use string references for relationships to avoid circular imports
|
|
summary = relationship("Summary", back_populates="rag_chunks", lazy='select')
|
|
```
|
|
|
|
#### Agent Summary Model
|
|
```python
|
|
# backend/models/epic4/agent_summaries.py
|
|
from backend.models.base import Model, GUID
|
|
|
|
class AgentSummary(Model):
|
|
"""Multi-agent analysis results"""
|
|
__tablename__ = "agent_summaries"
|
|
__table_args__ = {'extend_existing': True}
|
|
|
|
id = Column(GUID, primary_key=True, default=uuid.uuid4)
|
|
summary_id = Column(GUID, ForeignKey("summaries.id", ondelete='CASCADE'))
|
|
agent_type = Column(String(20), nullable=False) # technical, business, user, synthesis
|
|
|
|
# JSON fields for flexible schema
|
|
analysis_result = Column(JSON, nullable=False)
|
|
|
|
# Relationships
|
|
summary = relationship("Summary", back_populates="agent_analyses")
|
|
```
|
|
|
|
### 5. Multi-Agent Integration with Database
|
|
|
|
#### Update Multi-Agent Orchestrator
|
|
```python
|
|
# backend/services/multi_agent_orchestrator.py
|
|
from backend.models.epic4 import AgentSummary
|
|
from backend.core.database import get_db
|
|
|
|
class MultiAgentVideoOrchestrator:
|
|
"""Enhanced orchestrator with database persistence"""
|
|
|
|
async def save_analysis_to_database(
|
|
self,
|
|
summary_id: str,
|
|
analysis_result: Dict[str, Any],
|
|
db: Session
|
|
) -> List[AgentSummary]:
|
|
"""Save multi-agent analysis to database"""
|
|
|
|
agent_summaries = []
|
|
|
|
for perspective_type, analysis in analysis_result['perspectives'].items():
|
|
agent_summary = AgentSummary(
|
|
summary_id=summary_id,
|
|
agent_type=perspective_type,
|
|
analysis_result=analysis
|
|
)
|
|
db.add(agent_summary)
|
|
agent_summaries.append(agent_summary)
|
|
|
|
db.commit()
|
|
return agent_summaries
|
|
```
|
|
|
|
### 6. API Router Re-enablement
|
|
|
|
#### Update Main Application
|
|
```python
|
|
# backend/main.py
|
|
|
|
# Import Epic 4 models in correct order
|
|
from backend.models.epic4 import (
|
|
PromptTemplate, AgentSummary,
|
|
EnhancedExport, ExportSection,
|
|
RAGChunk, VectorEmbedding
|
|
)
|
|
|
|
# Re-enable routers
|
|
from backend.api.multi_agent import router as multi_agent_router
|
|
from backend.api.enhanced_export import router as enhanced_export_router
|
|
from backend.api.prompt_templates import router as templates_router
|
|
|
|
# Include all routers
|
|
app.include_router(multi_agent_router)
|
|
app.include_router(enhanced_export_router)
|
|
app.include_router(templates_router)
|
|
```
|
|
|
|
### 7. Implementation Steps
|
|
|
|
#### Step 1: Database Reset and Migration
|
|
```bash
|
|
# Backup current database
|
|
cp data/app.db data/app.db.backup
|
|
|
|
# Reset migrations to clean state
|
|
PYTHONPATH=. ../venv/bin/python3 -m alembic downgrade base
|
|
PYTHONPATH=. ../venv/bin/python3 -m alembic upgrade head
|
|
```
|
|
|
|
#### Step 2: Model Refactoring
|
|
1. Create `backend/models/epic4/` directory
|
|
2. Move all Epic 4 models to new directory
|
|
3. Add `__table_args__ = {'extend_existing': True}` to all models
|
|
4. Update imports to use new structure
|
|
|
|
#### Step 3: Update Service Layer
|
|
1. Update multi-agent orchestrator to save to database
|
|
2. Add database persistence to playlist analyzer
|
|
3. Create enhanced export service with database integration
|
|
|
|
#### Step 4: Re-enable and Test
|
|
1. Re-enable disabled routers in main.py
|
|
2. Run comprehensive tests
|
|
3. Verify all Epic 4 features work together
|
|
|
|
## Testing Strategy
|
|
|
|
### Integration Tests
|
|
```python
|
|
# tests/integration/test_epic4_integration.py
|
|
|
|
async def test_multi_agent_with_database():
|
|
"""Test multi-agent analysis saves to database"""
|
|
# Create summary
|
|
# Run multi-agent analysis
|
|
# Verify agent_summaries table populated
|
|
|
|
async def test_enhanced_export_with_templates():
|
|
"""Test enhanced export uses prompt templates"""
|
|
# Create prompt template
|
|
# Generate enhanced export
|
|
# Verify export uses template
|
|
|
|
async def test_rag_chat_with_chunks():
|
|
"""Test RAG chat creates and uses chunks"""
|
|
# Create summary
|
|
# Generate RAG chunks
|
|
# Test chat interface
|
|
```
|
|
|
|
## Benefits of This Architecture
|
|
|
|
1. **Clean Separation**: Models, migrations, and services are properly separated
|
|
2. **No Circular Dependencies**: Lazy loading and string references prevent cycles
|
|
3. **Database Integrity**: Foreign keys properly enforced with cascading deletes
|
|
4. **Extensibility**: Easy to add new Epic 4 features without breaking existing ones
|
|
5. **Performance**: Optimized indexes and relationships for fast queries
|
|
6. **Maintainability**: Clear structure makes debugging and updates easier
|
|
|
|
## Rollback Plan
|
|
|
|
If issues occur:
|
|
1. Restore database backup: `cp data/app.db.backup data/app.db`
|
|
2. Revert code changes: `git checkout -- backend/models`
|
|
3. Disable Epic 4 routers temporarily
|
|
4. Debug specific issues before re-attempting
|
|
|
|
## Success Criteria
|
|
|
|
✅ All migrations apply without errors
|
|
✅ No "table already exists" errors
|
|
✅ Multi-agent analysis saves to database
|
|
✅ Enhanced exports work with templates
|
|
✅ RAG chat functions with vector embeddings
|
|
✅ All Epic 4 API endpoints return 200 status
|
|
✅ No circular import errors
|
|
✅ Frontend can access all Epic 4 features
|
|
|
|
## Timeline
|
|
|
|
- **Hour 1**: Database migration and reset
|
|
- **Hour 2**: Model refactoring and epic4 package creation
|
|
- **Hour 3**: Service layer updates
|
|
- **Hour 4**: API router re-enablement and testing
|
|
- **Hour 5**: Integration testing and bug fixes
|
|
- **Hour 6**: Documentation and deployment
|
|
|
|
This comprehensive solution addresses all database issues while maintaining the benefits of the DatabaseRegistry pattern and enabling all Epic 4 features to work together seamlessly. |