youtube-automation/WATCHER_SERVICE_README.md

176 lines
4.9 KiB
Markdown

# YouTube Thumbnail Watcher Service
## 🎯 Overview
A Python backend service that automatically monitors Directus `media_items` collection and downloads YouTube thumbnails for items that don't have them yet. This provides a robust alternative to Directus Flows.
## ✅ Features Implemented
- **Automatic Polling**: Checks Directus every 30 seconds for unprocessed YouTube items
- **Smart Filtering**: Only processes items with `type='youtube_video'` or `type='youtube'` that have URLs but no thumbnails
- **Quality Fallback**: Downloads best available thumbnail (maxres → high → medium → default)
- **Robust Error Handling**: Continues processing on individual failures
- **Comprehensive Logging**: Detailed logs with statistics and error tracking
- **Stateless Design**: Can be restarted anytime without data loss
## 🏗️ Architecture
```
projects/youtube-automation/
├── config.py # Configuration management
├── src/
│ ├── directus_client.py # Directus API wrapper
│ ├── youtube_processor.py # YouTube thumbnail logic
│ └── watcher_service.py # Main polling service
├── run_watcher.sh # Startup script
├── requirements.txt # Dependencies
└── .env # Environment variables
```
## 🚀 Usage
### Start the Service
```bash
# Make sure you have a .env file configured
./run_watcher.sh
```
### Environment Variables (.env)
```bash
DIRECTUS_URL="https://enias.zeabur.app/"
DIRECTUS_TOKEN="your_token"
YOUTUBE_API_KEY="optional_youtube_api_key"
POLL_INTERVAL=30
BATCH_SIZE=10
LOG_LEVEL=INFO
```
### Monitor Logs
```bash
tail -f /tmp/youtube_watcher.log
```
## 📊 Service Statistics
The service tracks and reports:
- Items processed
- Success/failure rates
- Uptime
- Processing speed
Example output:
```
📊 YouTube Thumbnail Watcher Statistics
Uptime: 0:02:15
Items Processed: 5
Succeeded: 3
Failed: 2
Success Rate: 60.0%
```
## 🔄 Processing Flow
1. **Poll Directus**: Query `media_items` for unprocessed YouTube videos
2. **Extract Video IDs**: Parse YouTube URLs to get video identifiers
3. **Download Thumbnails**: Try multiple quality levels until success
4. **Upload to Directus**: Create file entries in Directus files collection
5. **Update Items**: Link thumbnails to original media items
6. **Log Results**: Track success/failure for monitoring
## 🧪 Testing Results
Successfully tested with:
- ✅ Rick Roll video (dQw4w9WgXcQ) - 65KB thumbnail
- ✅ LLM Introduction video (zjkBMFhNj_g) - 184KB thumbnail
- ✅ RAG tutorial video - Full processing pipeline
Failed gracefully with:
- ❌ Private/deleted videos - Proper error handling
- ❌ Age-restricted videos - Continues to next item
## 🔧 Configuration
### Database Query
```python
filter = {
"_and": [
{"_or": [{"type": {"_eq": "youtube_video"}}, {"type": {"_eq": "youtube"}}]},
{"url": {"_nnull": True}},
{"youtube_thumbnail": {"_null": True}}
]
}
```
### Thumbnail Quality Priority
1. **maxres**: 1280x720 (best quality)
2. **high**: 480x360
3. **medium**: 320x180
4. **default**: 120x90 (fallback)
## 🎉 Benefits Over Directus Flows
**Better Error Handling**: Individual failures don't stop the service
**Comprehensive Logging**: Full visibility into processing
**Easy Testing**: Can test individual components
**Flexible Deployment**: Run anywhere, not tied to Directus
**Stateless Recovery**: Restart anytime without issues
**Performance Monitoring**: Built-in statistics and metrics
## 🔍 Monitoring
### Key Metrics to Watch
- Processing success rate (should be >80% for public videos)
- Queue size (items waiting for processing)
- Error patterns (404s vs network issues)
- Processing speed (items per minute)
### Log Levels
- **INFO**: Normal operation and successful processing
- **WARNING**: Failed downloads, retries
- **ERROR**: Critical failures, configuration issues
- **DEBUG**: Detailed processing information
## 🚀 Production Deployment
### Systemd Service (Linux)
```bash
# Create service file
sudo nano /etc/systemd/system/youtube-watcher.service
[Unit]
Description=YouTube Thumbnail Watcher
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/path/to/youtube-automation
ExecStart=/path/to/youtube-automation/run_watcher.sh
Restart=always
[Install]
WantedBy=multi-user.target
```
### Docker Alternative
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "src/watcher_service.py"]
```
## 🎯 Next Steps
The core YouTube thumbnail automation is complete and working! The service successfully:
1. ✅ Polls Directus for unprocessed YouTube items
2. ✅ Downloads thumbnails with quality fallback
3. ✅ Uploads files to Directus
4. ✅ Updates media items with thumbnail references
5. ✅ Handles errors gracefully
6. ✅ Provides comprehensive logging and statistics
The service is ready for production use!