326 lines
9.5 KiB
Markdown
326 lines
9.5 KiB
Markdown
# Frontend-CLI Integration Plan for YouTube Summarizer
|
|
|
|
## Current State Analysis
|
|
|
|
### Problem
|
|
The YouTube Summarizer has **two separate storage systems**:
|
|
1. **Frontend/API**: Uses file-based storage (`SummaryStorageService`) in `video_storage/summaries/`
|
|
2. **Interactive CLI**: Uses SQLite database via `SummaryManager` with SQLAlchemy models
|
|
|
|
**Result**: Summaries created via the frontend don't appear in the CLI, and vice versa.
|
|
|
|
### Current Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Frontend Flow"
|
|
FE[React Frontend] --> API[FastAPI /api/process]
|
|
API --> SP[SummaryPipeline]
|
|
SP --> SSS[SummaryStorageService]
|
|
SSS --> FS[File System: video_storage/]
|
|
end
|
|
|
|
subgraph "CLI Flow"
|
|
CLI[Interactive CLI] --> SM[SummaryManager]
|
|
SM --> DB[SQLite Database]
|
|
end
|
|
|
|
style FS fill:#ffcccc
|
|
style DB fill:#ccccff
|
|
```
|
|
|
|
## Proposed Solution: Unified Database Storage
|
|
|
|
### Architecture Changes
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Unified Flow"
|
|
FE[React Frontend] --> API[FastAPI /api/process]
|
|
CLI[Interactive CLI] --> SM[SummaryManager]
|
|
API --> SP[SummaryPipeline]
|
|
SP --> DSS[DatabaseStorageService]
|
|
SM --> DSS
|
|
DSS --> DB[(SQLite Database)]
|
|
DSS --> FS[File System - Optional Cache]
|
|
end
|
|
|
|
style DB fill:#90EE90
|
|
```
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Add Database Storage to Pipeline (2-3 hours)
|
|
|
|
#### 1.1 Create DatabaseStorageService
|
|
```python
|
|
# backend/services/database_storage_service.py
|
|
from backend.models import Summary
|
|
from backend.core.database_registry import registry
|
|
|
|
class DatabaseStorageService:
|
|
"""Unified storage service for summaries."""
|
|
|
|
def save_summary_to_db(self, pipeline_result: PipelineResult) -> Summary:
|
|
"""Save pipeline result to database."""
|
|
with self.get_session() as session:
|
|
summary = Summary(
|
|
video_id=pipeline_result.video_id,
|
|
video_url=pipeline_result.video_url,
|
|
video_title=pipeline_result.metadata.get('title'),
|
|
transcript=pipeline_result.transcript,
|
|
summary=pipeline_result.summary.get('content'),
|
|
key_points=pipeline_result.summary.get('key_points'),
|
|
main_themes=pipeline_result.summary.get('main_themes'),
|
|
model_used=pipeline_result.model_used,
|
|
processing_time=pipeline_result.processing_time,
|
|
quality_score=pipeline_result.quality_metrics.overall_score
|
|
)
|
|
session.add(summary)
|
|
session.commit()
|
|
return summary
|
|
```
|
|
|
|
#### 1.2 Modify SummaryPipeline
|
|
```python
|
|
# backend/services/summary_pipeline.py
|
|
async def _execute_pipeline(self, job_id: str, config: PipelineConfig):
|
|
# ... existing pipeline code ...
|
|
|
|
# After successful completion
|
|
if result.status == "completed":
|
|
# Save to database
|
|
db_service = DatabaseStorageService()
|
|
saved_summary = db_service.save_summary_to_db(result)
|
|
result.summary_id = saved_summary.id
|
|
|
|
# Optional: Keep file storage for backward compatibility
|
|
if self.enable_file_storage:
|
|
storage_service.save_summary(...)
|
|
```
|
|
|
|
### Phase 2: Update API Endpoints (1-2 hours)
|
|
|
|
#### 2.1 Add Summary Retrieval Endpoints
|
|
```python
|
|
# backend/api/summaries.py
|
|
@router.get("/summaries")
|
|
async def list_summaries(
|
|
limit: int = 10,
|
|
skip: int = 0,
|
|
db: Session = Depends(get_db)
|
|
):
|
|
"""List all summaries from database."""
|
|
return db.query(Summary).offset(skip).limit(limit).all()
|
|
|
|
@router.get("/summaries/{summary_id}")
|
|
async def get_summary(
|
|
summary_id: str,
|
|
db: Session = Depends(get_db)
|
|
):
|
|
"""Get specific summary by ID."""
|
|
return db.query(Summary).filter_by(id=summary_id).first()
|
|
```
|
|
|
|
#### 2.2 Update Frontend API Client
|
|
```typescript
|
|
// frontend/src/api/summaryClient.ts
|
|
export const getSummaryHistory = async (): Promise<Summary[]> => {
|
|
const response = await fetch('/api/summaries');
|
|
return response.json();
|
|
};
|
|
```
|
|
|
|
### Phase 3: Migrate Existing Data (1 hour)
|
|
|
|
#### 3.1 Create Migration Script
|
|
```python
|
|
# scripts/migrate_file_summaries_to_db.py
|
|
def migrate_summaries():
|
|
"""Migrate existing file-based summaries to database."""
|
|
storage = SummaryStorageService()
|
|
db_service = DatabaseStorageService()
|
|
|
|
for video_id in storage.get_videos_with_summaries():
|
|
summaries = storage.list_summaries(video_id)
|
|
for summary_data in summaries:
|
|
# Convert file format to database format
|
|
db_service.save_summary_to_db(summary_data)
|
|
```
|
|
|
|
### Phase 4: Enhanced Features (Optional - 2-3 hours)
|
|
|
|
#### 4.1 Add User Association
|
|
```python
|
|
# Track which interface created the summary
|
|
summary.source = "frontend" # or "cli", "api"
|
|
summary.user_id = current_user.id if authenticated else None
|
|
```
|
|
|
|
#### 4.2 Add Search and Filtering
|
|
```python
|
|
@router.get("/summaries/search")
|
|
async def search_summaries(
|
|
query: str = None,
|
|
model: str = None,
|
|
date_from: datetime = None,
|
|
date_to: datetime = None
|
|
):
|
|
"""Advanced search across summaries."""
|
|
```
|
|
|
|
#### 4.3 Real-time Updates
|
|
```python
|
|
# Use WebSocket to notify CLI when frontend creates summary
|
|
async def broadcast_new_summary(summary: Summary):
|
|
await websocket_manager.broadcast({
|
|
"type": "new_summary",
|
|
"summary": summary.dict()
|
|
})
|
|
```
|
|
|
|
## Implementation Steps
|
|
|
|
### Step 1: Database Setup ✅
|
|
- [x] Ensure database tables are created
|
|
- [x] Fix any SQLAlchemy relationship issues
|
|
|
|
### Step 2: Create Unified Storage Service
|
|
```bash
|
|
# Create the service
|
|
touch backend/services/database_storage_service.py
|
|
|
|
# Test the service
|
|
python3 -c "from backend.services.database_storage_service import DatabaseStorageService; ..."
|
|
```
|
|
|
|
### Step 3: Update Pipeline
|
|
1. Import DatabaseStorageService in summary_pipeline.py
|
|
2. Add database save after successful completion
|
|
3. Test with a real video URL
|
|
|
|
### Step 4: Update API
|
|
1. Create new router for summary management
|
|
2. Add endpoints for listing and retrieving
|
|
3. Update main.py to include new router
|
|
|
|
### Step 5: Test Integration
|
|
```bash
|
|
# Test frontend flow
|
|
curl -X POST http://localhost:8000/api/process \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"video_url": "https://youtube.com/watch?v=test"}'
|
|
|
|
# Check CLI sees it
|
|
python3 backend/interactive_cli.py
|
|
# Choose option 2 (List Summaries)
|
|
```
|
|
|
|
### Step 6: Frontend Updates
|
|
1. Add history page using new API endpoints
|
|
2. Update dashboard to show recent summaries
|
|
3. Add search/filter capabilities
|
|
|
|
## Benefits of This Integration
|
|
|
|
1. **Single Source of Truth**: All summaries in one database
|
|
2. **Cross-Interface Visibility**: Frontend and CLI see same data
|
|
3. **Better Analytics**: Query across all summaries easily
|
|
4. **User Tracking**: Associate summaries with users
|
|
5. **Search Capabilities**: Full-text search across all summaries
|
|
6. **Audit Trail**: Track creation source and timestamps
|
|
7. **Scalability**: Easy to migrate to PostgreSQL later
|
|
|
|
## Backward Compatibility
|
|
|
|
### Option 1: Dual Storage (Recommended Initially)
|
|
- Keep file storage for existing integrations
|
|
- Write to both database and files
|
|
- Gradually deprecate file storage
|
|
|
|
### Option 2: File Storage as Cache
|
|
- Database as primary storage
|
|
- Files for export/backup only
|
|
- Periodic sync from DB to files
|
|
|
|
## Testing Plan
|
|
|
|
### Unit Tests
|
|
```python
|
|
def test_database_storage_service():
|
|
service = DatabaseStorageService()
|
|
result = create_mock_pipeline_result()
|
|
saved = service.save_summary_to_db(result)
|
|
assert saved.id is not None
|
|
assert saved.video_id == result.video_id
|
|
```
|
|
|
|
### Integration Tests
|
|
```python
|
|
def test_frontend_to_cli_flow():
|
|
# Create via API
|
|
response = client.post("/api/process", json={...})
|
|
job_id = response.json()["job_id"]
|
|
|
|
# Wait for completion
|
|
summary = wait_for_summary(job_id)
|
|
|
|
# Verify in database
|
|
db_summary = session.query(Summary).filter_by(
|
|
video_id=summary["video_id"]
|
|
).first()
|
|
assert db_summary is not None
|
|
```
|
|
|
|
## Migration Timeline
|
|
|
|
| Phase | Task | Duration | Priority |
|
|
|-------|------|----------|----------|
|
|
| 1 | Create DatabaseStorageService | 2 hours | High |
|
|
| 2 | Update SummaryPipeline | 1 hour | High |
|
|
| 3 | Add API endpoints | 2 hours | High |
|
|
| 4 | Test integration | 1 hour | High |
|
|
| 5 | Migrate existing data | 1 hour | Medium |
|
|
| 6 | Update frontend | 2 hours | Medium |
|
|
| 7 | Add search features | 2 hours | Low |
|
|
| 8 | Documentation | 1 hour | Medium |
|
|
|
|
**Total Estimated Time**: 8-12 hours for full implementation
|
|
|
|
## Quick Win Implementation (2 hours)
|
|
|
|
For immediate results with minimal changes:
|
|
|
|
1. **Add one line to SummaryPipeline** to save to database
|
|
2. **CLI automatically sees new summaries**
|
|
3. **No frontend changes needed initially**
|
|
|
|
```python
|
|
# Quick fix in summary_pipeline.py
|
|
if result.status == "completed":
|
|
# Add this line
|
|
SummaryManager().save_summary(result.__dict__)
|
|
# Existing code continues...
|
|
```
|
|
|
|
This provides immediate integration while planning the full implementation.
|
|
|
|
## Next Steps
|
|
|
|
1. Review and approve this plan
|
|
2. Create feature branch: `git checkout -b feature/unified-storage`
|
|
3. Implement Phase 1 (DatabaseStorageService)
|
|
4. Test with both frontend and CLI
|
|
5. Gradually implement remaining phases
|
|
|
|
## Questions to Consider
|
|
|
|
1. Should we keep file storage for backward compatibility?
|
|
2. Do we need user authentication before associating summaries?
|
|
3. Should we add an admin interface for managing all summaries?
|
|
4. What retention policy for old summaries?
|
|
5. Should we add export/import capabilities?
|
|
|
|
---
|
|
|
|
This plan ensures that summaries created through any interface (Frontend, CLI, or API) are visible everywhere, providing a seamless user experience. |