youtube-summarizer/docs/FRONTEND_CLI_INTEGRATION_PL...

9.5 KiB

Frontend-CLI Integration Plan for YouTube Summarizer

Current State Analysis

Problem

The YouTube Summarizer has two separate storage systems:

  1. Frontend/API: Uses file-based storage (SummaryStorageService) in video_storage/summaries/
  2. Interactive CLI: Uses SQLite database via SummaryManager with SQLAlchemy models

Result: Summaries created via the frontend don't appear in the CLI, and vice versa.

Current Architecture

graph TB
    subgraph "Frontend Flow"
        FE[React Frontend] --> API[FastAPI /api/process]
        API --> SP[SummaryPipeline]
        SP --> SSS[SummaryStorageService]
        SSS --> FS[File System: video_storage/]
    end
    
    subgraph "CLI Flow"
        CLI[Interactive CLI] --> SM[SummaryManager]
        SM --> DB[SQLite Database]
    end
    
    style FS fill:#ffcccc
    style DB fill:#ccccff

Proposed Solution: Unified Database Storage

Architecture Changes

graph TB
    subgraph "Unified Flow"
        FE[React Frontend] --> API[FastAPI /api/process]
        CLI[Interactive CLI] --> SM[SummaryManager]
        API --> SP[SummaryPipeline]
        SP --> DSS[DatabaseStorageService]
        SM --> DSS
        DSS --> DB[(SQLite Database)]
        DSS --> FS[File System - Optional Cache]
    end
    
    style DB fill:#90EE90

Implementation Plan

Phase 1: Add Database Storage to Pipeline (2-3 hours)

1.1 Create DatabaseStorageService

# backend/services/database_storage_service.py
from backend.models import Summary
from backend.core.database_registry import registry

class DatabaseStorageService:
    """Unified storage service for summaries."""
    
    def save_summary_to_db(self, pipeline_result: PipelineResult) -> Summary:
        """Save pipeline result to database."""
        with self.get_session() as session:
            summary = Summary(
                video_id=pipeline_result.video_id,
                video_url=pipeline_result.video_url,
                video_title=pipeline_result.metadata.get('title'),
                transcript=pipeline_result.transcript,
                summary=pipeline_result.summary.get('content'),
                key_points=pipeline_result.summary.get('key_points'),
                main_themes=pipeline_result.summary.get('main_themes'),
                model_used=pipeline_result.model_used,
                processing_time=pipeline_result.processing_time,
                quality_score=pipeline_result.quality_metrics.overall_score
            )
            session.add(summary)
            session.commit()
            return summary

1.2 Modify SummaryPipeline

# backend/services/summary_pipeline.py
async def _execute_pipeline(self, job_id: str, config: PipelineConfig):
    # ... existing pipeline code ...
    
    # After successful completion
    if result.status == "completed":
        # Save to database
        db_service = DatabaseStorageService()
        saved_summary = db_service.save_summary_to_db(result)
        result.summary_id = saved_summary.id
        
        # Optional: Keep file storage for backward compatibility
        if self.enable_file_storage:
            storage_service.save_summary(...)

Phase 2: Update API Endpoints (1-2 hours)

2.1 Add Summary Retrieval Endpoints

# backend/api/summaries.py
@router.get("/summaries")
async def list_summaries(
    limit: int = 10,
    skip: int = 0,
    db: Session = Depends(get_db)
):
    """List all summaries from database."""
    return db.query(Summary).offset(skip).limit(limit).all()

@router.get("/summaries/{summary_id}")
async def get_summary(
    summary_id: str,
    db: Session = Depends(get_db)
):
    """Get specific summary by ID."""
    return db.query(Summary).filter_by(id=summary_id).first()

2.2 Update Frontend API Client

// frontend/src/api/summaryClient.ts
export const getSummaryHistory = async (): Promise<Summary[]> => {
    const response = await fetch('/api/summaries');
    return response.json();
};

Phase 3: Migrate Existing Data (1 hour)

3.1 Create Migration Script

# scripts/migrate_file_summaries_to_db.py
def migrate_summaries():
    """Migrate existing file-based summaries to database."""
    storage = SummaryStorageService()
    db_service = DatabaseStorageService()
    
    for video_id in storage.get_videos_with_summaries():
        summaries = storage.list_summaries(video_id)
        for summary_data in summaries:
            # Convert file format to database format
            db_service.save_summary_to_db(summary_data)

Phase 4: Enhanced Features (Optional - 2-3 hours)

4.1 Add User Association

# Track which interface created the summary
summary.source = "frontend"  # or "cli", "api"
summary.user_id = current_user.id if authenticated else None

4.2 Add Search and Filtering

@router.get("/summaries/search")
async def search_summaries(
    query: str = None,
    model: str = None,
    date_from: datetime = None,
    date_to: datetime = None
):
    """Advanced search across summaries."""

4.3 Real-time Updates

# Use WebSocket to notify CLI when frontend creates summary
async def broadcast_new_summary(summary: Summary):
    await websocket_manager.broadcast({
        "type": "new_summary",
        "summary": summary.dict()
    })

Implementation Steps

Step 1: Database Setup

  • Ensure database tables are created
  • Fix any SQLAlchemy relationship issues

Step 2: Create Unified Storage Service

# Create the service
touch backend/services/database_storage_service.py

# Test the service
python3 -c "from backend.services.database_storage_service import DatabaseStorageService; ..."

Step 3: Update Pipeline

  1. Import DatabaseStorageService in summary_pipeline.py
  2. Add database save after successful completion
  3. Test with a real video URL

Step 4: Update API

  1. Create new router for summary management
  2. Add endpoints for listing and retrieving
  3. Update main.py to include new router

Step 5: Test Integration

# Test frontend flow
curl -X POST http://localhost:8000/api/process \
  -H "Content-Type: application/json" \
  -d '{"video_url": "https://youtube.com/watch?v=test"}'

# Check CLI sees it
python3 backend/interactive_cli.py
# Choose option 2 (List Summaries)

Step 6: Frontend Updates

  1. Add history page using new API endpoints
  2. Update dashboard to show recent summaries
  3. Add search/filter capabilities

Benefits of This Integration

  1. Single Source of Truth: All summaries in one database
  2. Cross-Interface Visibility: Frontend and CLI see same data
  3. Better Analytics: Query across all summaries easily
  4. User Tracking: Associate summaries with users
  5. Search Capabilities: Full-text search across all summaries
  6. Audit Trail: Track creation source and timestamps
  7. Scalability: Easy to migrate to PostgreSQL later

Backward Compatibility

  • Keep file storage for existing integrations
  • Write to both database and files
  • Gradually deprecate file storage

Option 2: File Storage as Cache

  • Database as primary storage
  • Files for export/backup only
  • Periodic sync from DB to files

Testing Plan

Unit Tests

def test_database_storage_service():
    service = DatabaseStorageService()
    result = create_mock_pipeline_result()
    saved = service.save_summary_to_db(result)
    assert saved.id is not None
    assert saved.video_id == result.video_id

Integration Tests

def test_frontend_to_cli_flow():
    # Create via API
    response = client.post("/api/process", json={...})
    job_id = response.json()["job_id"]
    
    # Wait for completion
    summary = wait_for_summary(job_id)
    
    # Verify in database
    db_summary = session.query(Summary).filter_by(
        video_id=summary["video_id"]
    ).first()
    assert db_summary is not None

Migration Timeline

Phase Task Duration Priority
1 Create DatabaseStorageService 2 hours High
2 Update SummaryPipeline 1 hour High
3 Add API endpoints 2 hours High
4 Test integration 1 hour High
5 Migrate existing data 1 hour Medium
6 Update frontend 2 hours Medium
7 Add search features 2 hours Low
8 Documentation 1 hour Medium

Total Estimated Time: 8-12 hours for full implementation

Quick Win Implementation (2 hours)

For immediate results with minimal changes:

  1. Add one line to SummaryPipeline to save to database
  2. CLI automatically sees new summaries
  3. No frontend changes needed initially
# Quick fix in summary_pipeline.py
if result.status == "completed":
    # Add this line
    SummaryManager().save_summary(result.__dict__)
    # Existing code continues...

This provides immediate integration while planning the full implementation.

Next Steps

  1. Review and approve this plan
  2. Create feature branch: git checkout -b feature/unified-storage
  3. Implement Phase 1 (DatabaseStorageService)
  4. Test with both frontend and CLI
  5. Gradually implement remaining phases

Questions to Consider

  1. Should we keep file storage for backward compatibility?
  2. Do we need user authentication before associating summaries?
  3. Should we add an admin interface for managing all summaries?
  4. What retention policy for old summaries?
  5. Should we add export/import capabilities?

This plan ensures that summaries created through any interface (Frontend, CLI, or API) are visible everywhere, providing a seamless user experience.