# Frontend-CLI Integration Plan for YouTube Summarizer ## Current State Analysis ### Problem The YouTube Summarizer has **two separate storage systems**: 1. **Frontend/API**: Uses file-based storage (`SummaryStorageService`) in `video_storage/summaries/` 2. **Interactive CLI**: Uses SQLite database via `SummaryManager` with SQLAlchemy models **Result**: Summaries created via the frontend don't appear in the CLI, and vice versa. ### Current Architecture ```mermaid graph TB subgraph "Frontend Flow" FE[React Frontend] --> API[FastAPI /api/process] API --> SP[SummaryPipeline] SP --> SSS[SummaryStorageService] SSS --> FS[File System: video_storage/] end subgraph "CLI Flow" CLI[Interactive CLI] --> SM[SummaryManager] SM --> DB[SQLite Database] end style FS fill:#ffcccc style DB fill:#ccccff ``` ## Proposed Solution: Unified Database Storage ### Architecture Changes ```mermaid graph TB subgraph "Unified Flow" FE[React Frontend] --> API[FastAPI /api/process] CLI[Interactive CLI] --> SM[SummaryManager] API --> SP[SummaryPipeline] SP --> DSS[DatabaseStorageService] SM --> DSS DSS --> DB[(SQLite Database)] DSS --> FS[File System - Optional Cache] end style DB fill:#90EE90 ``` ## Implementation Plan ### Phase 1: Add Database Storage to Pipeline (2-3 hours) #### 1.1 Create DatabaseStorageService ```python # backend/services/database_storage_service.py from backend.models import Summary from backend.core.database_registry import registry class DatabaseStorageService: """Unified storage service for summaries.""" def save_summary_to_db(self, pipeline_result: PipelineResult) -> Summary: """Save pipeline result to database.""" with self.get_session() as session: summary = Summary( video_id=pipeline_result.video_id, video_url=pipeline_result.video_url, video_title=pipeline_result.metadata.get('title'), transcript=pipeline_result.transcript, summary=pipeline_result.summary.get('content'), key_points=pipeline_result.summary.get('key_points'), main_themes=pipeline_result.summary.get('main_themes'), model_used=pipeline_result.model_used, processing_time=pipeline_result.processing_time, quality_score=pipeline_result.quality_metrics.overall_score ) session.add(summary) session.commit() return summary ``` #### 1.2 Modify SummaryPipeline ```python # backend/services/summary_pipeline.py async def _execute_pipeline(self, job_id: str, config: PipelineConfig): # ... existing pipeline code ... # After successful completion if result.status == "completed": # Save to database db_service = DatabaseStorageService() saved_summary = db_service.save_summary_to_db(result) result.summary_id = saved_summary.id # Optional: Keep file storage for backward compatibility if self.enable_file_storage: storage_service.save_summary(...) ``` ### Phase 2: Update API Endpoints (1-2 hours) #### 2.1 Add Summary Retrieval Endpoints ```python # backend/api/summaries.py @router.get("/summaries") async def list_summaries( limit: int = 10, skip: int = 0, db: Session = Depends(get_db) ): """List all summaries from database.""" return db.query(Summary).offset(skip).limit(limit).all() @router.get("/summaries/{summary_id}") async def get_summary( summary_id: str, db: Session = Depends(get_db) ): """Get specific summary by ID.""" return db.query(Summary).filter_by(id=summary_id).first() ``` #### 2.2 Update Frontend API Client ```typescript // frontend/src/api/summaryClient.ts export const getSummaryHistory = async (): Promise => { const response = await fetch('/api/summaries'); return response.json(); }; ``` ### Phase 3: Migrate Existing Data (1 hour) #### 3.1 Create Migration Script ```python # scripts/migrate_file_summaries_to_db.py def migrate_summaries(): """Migrate existing file-based summaries to database.""" storage = SummaryStorageService() db_service = DatabaseStorageService() for video_id in storage.get_videos_with_summaries(): summaries = storage.list_summaries(video_id) for summary_data in summaries: # Convert file format to database format db_service.save_summary_to_db(summary_data) ``` ### Phase 4: Enhanced Features (Optional - 2-3 hours) #### 4.1 Add User Association ```python # Track which interface created the summary summary.source = "frontend" # or "cli", "api" summary.user_id = current_user.id if authenticated else None ``` #### 4.2 Add Search and Filtering ```python @router.get("/summaries/search") async def search_summaries( query: str = None, model: str = None, date_from: datetime = None, date_to: datetime = None ): """Advanced search across summaries.""" ``` #### 4.3 Real-time Updates ```python # Use WebSocket to notify CLI when frontend creates summary async def broadcast_new_summary(summary: Summary): await websocket_manager.broadcast({ "type": "new_summary", "summary": summary.dict() }) ``` ## Implementation Steps ### Step 1: Database Setup ✅ - [x] Ensure database tables are created - [x] Fix any SQLAlchemy relationship issues ### Step 2: Create Unified Storage Service ```bash # Create the service touch backend/services/database_storage_service.py # Test the service python3 -c "from backend.services.database_storage_service import DatabaseStorageService; ..." ``` ### Step 3: Update Pipeline 1. Import DatabaseStorageService in summary_pipeline.py 2. Add database save after successful completion 3. Test with a real video URL ### Step 4: Update API 1. Create new router for summary management 2. Add endpoints for listing and retrieving 3. Update main.py to include new router ### Step 5: Test Integration ```bash # Test frontend flow curl -X POST http://localhost:8000/api/process \ -H "Content-Type: application/json" \ -d '{"video_url": "https://youtube.com/watch?v=test"}' # Check CLI sees it python3 backend/interactive_cli.py # Choose option 2 (List Summaries) ``` ### Step 6: Frontend Updates 1. Add history page using new API endpoints 2. Update dashboard to show recent summaries 3. Add search/filter capabilities ## Benefits of This Integration 1. **Single Source of Truth**: All summaries in one database 2. **Cross-Interface Visibility**: Frontend and CLI see same data 3. **Better Analytics**: Query across all summaries easily 4. **User Tracking**: Associate summaries with users 5. **Search Capabilities**: Full-text search across all summaries 6. **Audit Trail**: Track creation source and timestamps 7. **Scalability**: Easy to migrate to PostgreSQL later ## Backward Compatibility ### Option 1: Dual Storage (Recommended Initially) - Keep file storage for existing integrations - Write to both database and files - Gradually deprecate file storage ### Option 2: File Storage as Cache - Database as primary storage - Files for export/backup only - Periodic sync from DB to files ## Testing Plan ### Unit Tests ```python def test_database_storage_service(): service = DatabaseStorageService() result = create_mock_pipeline_result() saved = service.save_summary_to_db(result) assert saved.id is not None assert saved.video_id == result.video_id ``` ### Integration Tests ```python def test_frontend_to_cli_flow(): # Create via API response = client.post("/api/process", json={...}) job_id = response.json()["job_id"] # Wait for completion summary = wait_for_summary(job_id) # Verify in database db_summary = session.query(Summary).filter_by( video_id=summary["video_id"] ).first() assert db_summary is not None ``` ## Migration Timeline | Phase | Task | Duration | Priority | |-------|------|----------|----------| | 1 | Create DatabaseStorageService | 2 hours | High | | 2 | Update SummaryPipeline | 1 hour | High | | 3 | Add API endpoints | 2 hours | High | | 4 | Test integration | 1 hour | High | | 5 | Migrate existing data | 1 hour | Medium | | 6 | Update frontend | 2 hours | Medium | | 7 | Add search features | 2 hours | Low | | 8 | Documentation | 1 hour | Medium | **Total Estimated Time**: 8-12 hours for full implementation ## Quick Win Implementation (2 hours) For immediate results with minimal changes: 1. **Add one line to SummaryPipeline** to save to database 2. **CLI automatically sees new summaries** 3. **No frontend changes needed initially** ```python # Quick fix in summary_pipeline.py if result.status == "completed": # Add this line SummaryManager().save_summary(result.__dict__) # Existing code continues... ``` This provides immediate integration while planning the full implementation. ## Next Steps 1. Review and approve this plan 2. Create feature branch: `git checkout -b feature/unified-storage` 3. Implement Phase 1 (DatabaseStorageService) 4. Test with both frontend and CLI 5. Gradually implement remaining phases ## Questions to Consider 1. Should we keep file storage for backward compatibility? 2. Do we need user authentication before associating summaries? 3. Should we add an admin interface for managing all summaries? 4. What retention policy for old summaries? 5. Should we add export/import capabilities? --- This plan ensures that summaries created through any interface (Frontend, CLI, or API) are visible everywhere, providing a seamless user experience.