9.5 KiB
9.5 KiB
Frontend-CLI Integration Plan for YouTube Summarizer
Current State Analysis
Problem
The YouTube Summarizer has two separate storage systems:
- Frontend/API: Uses file-based storage (
SummaryStorageService) invideo_storage/summaries/ - Interactive CLI: Uses SQLite database via
SummaryManagerwith SQLAlchemy models
Result: Summaries created via the frontend don't appear in the CLI, and vice versa.
Current Architecture
graph TB
subgraph "Frontend Flow"
FE[React Frontend] --> API[FastAPI /api/process]
API --> SP[SummaryPipeline]
SP --> SSS[SummaryStorageService]
SSS --> FS[File System: video_storage/]
end
subgraph "CLI Flow"
CLI[Interactive CLI] --> SM[SummaryManager]
SM --> DB[SQLite Database]
end
style FS fill:#ffcccc
style DB fill:#ccccff
Proposed Solution: Unified Database Storage
Architecture Changes
graph TB
subgraph "Unified Flow"
FE[React Frontend] --> API[FastAPI /api/process]
CLI[Interactive CLI] --> SM[SummaryManager]
API --> SP[SummaryPipeline]
SP --> DSS[DatabaseStorageService]
SM --> DSS
DSS --> DB[(SQLite Database)]
DSS --> FS[File System - Optional Cache]
end
style DB fill:#90EE90
Implementation Plan
Phase 1: Add Database Storage to Pipeline (2-3 hours)
1.1 Create DatabaseStorageService
# backend/services/database_storage_service.py
from backend.models import Summary
from backend.core.database_registry import registry
class DatabaseStorageService:
"""Unified storage service for summaries."""
def save_summary_to_db(self, pipeline_result: PipelineResult) -> Summary:
"""Save pipeline result to database."""
with self.get_session() as session:
summary = Summary(
video_id=pipeline_result.video_id,
video_url=pipeline_result.video_url,
video_title=pipeline_result.metadata.get('title'),
transcript=pipeline_result.transcript,
summary=pipeline_result.summary.get('content'),
key_points=pipeline_result.summary.get('key_points'),
main_themes=pipeline_result.summary.get('main_themes'),
model_used=pipeline_result.model_used,
processing_time=pipeline_result.processing_time,
quality_score=pipeline_result.quality_metrics.overall_score
)
session.add(summary)
session.commit()
return summary
1.2 Modify SummaryPipeline
# backend/services/summary_pipeline.py
async def _execute_pipeline(self, job_id: str, config: PipelineConfig):
# ... existing pipeline code ...
# After successful completion
if result.status == "completed":
# Save to database
db_service = DatabaseStorageService()
saved_summary = db_service.save_summary_to_db(result)
result.summary_id = saved_summary.id
# Optional: Keep file storage for backward compatibility
if self.enable_file_storage:
storage_service.save_summary(...)
Phase 2: Update API Endpoints (1-2 hours)
2.1 Add Summary Retrieval Endpoints
# backend/api/summaries.py
@router.get("/summaries")
async def list_summaries(
limit: int = 10,
skip: int = 0,
db: Session = Depends(get_db)
):
"""List all summaries from database."""
return db.query(Summary).offset(skip).limit(limit).all()
@router.get("/summaries/{summary_id}")
async def get_summary(
summary_id: str,
db: Session = Depends(get_db)
):
"""Get specific summary by ID."""
return db.query(Summary).filter_by(id=summary_id).first()
2.2 Update Frontend API Client
// frontend/src/api/summaryClient.ts
export const getSummaryHistory = async (): Promise<Summary[]> => {
const response = await fetch('/api/summaries');
return response.json();
};
Phase 3: Migrate Existing Data (1 hour)
3.1 Create Migration Script
# scripts/migrate_file_summaries_to_db.py
def migrate_summaries():
"""Migrate existing file-based summaries to database."""
storage = SummaryStorageService()
db_service = DatabaseStorageService()
for video_id in storage.get_videos_with_summaries():
summaries = storage.list_summaries(video_id)
for summary_data in summaries:
# Convert file format to database format
db_service.save_summary_to_db(summary_data)
Phase 4: Enhanced Features (Optional - 2-3 hours)
4.1 Add User Association
# Track which interface created the summary
summary.source = "frontend" # or "cli", "api"
summary.user_id = current_user.id if authenticated else None
4.2 Add Search and Filtering
@router.get("/summaries/search")
async def search_summaries(
query: str = None,
model: str = None,
date_from: datetime = None,
date_to: datetime = None
):
"""Advanced search across summaries."""
4.3 Real-time Updates
# Use WebSocket to notify CLI when frontend creates summary
async def broadcast_new_summary(summary: Summary):
await websocket_manager.broadcast({
"type": "new_summary",
"summary": summary.dict()
})
Implementation Steps
Step 1: Database Setup ✅
- Ensure database tables are created
- Fix any SQLAlchemy relationship issues
Step 2: Create Unified Storage Service
# Create the service
touch backend/services/database_storage_service.py
# Test the service
python3 -c "from backend.services.database_storage_service import DatabaseStorageService; ..."
Step 3: Update Pipeline
- Import DatabaseStorageService in summary_pipeline.py
- Add database save after successful completion
- Test with a real video URL
Step 4: Update API
- Create new router for summary management
- Add endpoints for listing and retrieving
- Update main.py to include new router
Step 5: Test Integration
# Test frontend flow
curl -X POST http://localhost:8000/api/process \
-H "Content-Type: application/json" \
-d '{"video_url": "https://youtube.com/watch?v=test"}'
# Check CLI sees it
python3 backend/interactive_cli.py
# Choose option 2 (List Summaries)
Step 6: Frontend Updates
- Add history page using new API endpoints
- Update dashboard to show recent summaries
- Add search/filter capabilities
Benefits of This Integration
- Single Source of Truth: All summaries in one database
- Cross-Interface Visibility: Frontend and CLI see same data
- Better Analytics: Query across all summaries easily
- User Tracking: Associate summaries with users
- Search Capabilities: Full-text search across all summaries
- Audit Trail: Track creation source and timestamps
- Scalability: Easy to migrate to PostgreSQL later
Backward Compatibility
Option 1: Dual Storage (Recommended Initially)
- Keep file storage for existing integrations
- Write to both database and files
- Gradually deprecate file storage
Option 2: File Storage as Cache
- Database as primary storage
- Files for export/backup only
- Periodic sync from DB to files
Testing Plan
Unit Tests
def test_database_storage_service():
service = DatabaseStorageService()
result = create_mock_pipeline_result()
saved = service.save_summary_to_db(result)
assert saved.id is not None
assert saved.video_id == result.video_id
Integration Tests
def test_frontend_to_cli_flow():
# Create via API
response = client.post("/api/process", json={...})
job_id = response.json()["job_id"]
# Wait for completion
summary = wait_for_summary(job_id)
# Verify in database
db_summary = session.query(Summary).filter_by(
video_id=summary["video_id"]
).first()
assert db_summary is not None
Migration Timeline
| Phase | Task | Duration | Priority |
|---|---|---|---|
| 1 | Create DatabaseStorageService | 2 hours | High |
| 2 | Update SummaryPipeline | 1 hour | High |
| 3 | Add API endpoints | 2 hours | High |
| 4 | Test integration | 1 hour | High |
| 5 | Migrate existing data | 1 hour | Medium |
| 6 | Update frontend | 2 hours | Medium |
| 7 | Add search features | 2 hours | Low |
| 8 | Documentation | 1 hour | Medium |
Total Estimated Time: 8-12 hours for full implementation
Quick Win Implementation (2 hours)
For immediate results with minimal changes:
- Add one line to SummaryPipeline to save to database
- CLI automatically sees new summaries
- No frontend changes needed initially
# Quick fix in summary_pipeline.py
if result.status == "completed":
# Add this line
SummaryManager().save_summary(result.__dict__)
# Existing code continues...
This provides immediate integration while planning the full implementation.
Next Steps
- Review and approve this plan
- Create feature branch:
git checkout -b feature/unified-storage - Implement Phase 1 (DatabaseStorageService)
- Test with both frontend and CLI
- Gradually implement remaining phases
Questions to Consider
- Should we keep file storage for backward compatibility?
- Do we need user authentication before associating summaries?
- Should we add an admin interface for managing all summaries?
- What retention policy for old summaries?
- Should we add export/import capabilities?
This plan ensures that summaries created through any interface (Frontend, CLI, or API) are visible everywhere, providing a seamless user experience.