trax/docs/enhanced-cli.md

478 lines
13 KiB
Markdown

# Enhanced CLI Documentation
## Overview
The Enhanced CLI (`src/cli/enhanced_cli.py`) provides a modern, feature-rich interface for the Trax transcription platform with real-time progress reporting, performance monitoring, and advanced capabilities.
## Key Features
### 🎯 Real-Time Progress Reporting
- **Rich Progress Bars**: Beautiful progress bars with time estimates
- **Live Updates**: Real-time transcription progress updates
- **Time Remaining**: Accurate time-to-completion estimates
- **File Processing**: Individual file progress in batch operations
### 📊 Performance Monitoring
- **CPU Usage**: Real-time CPU utilization tracking
- **Memory Usage**: Current and total memory monitoring
- **Temperature**: CPU temperature monitoring (when available)
- **System Stats**: Live system resource statistics
### 🚀 Intelligent Batch Processing
- **Concurrent Execution**: Configurable parallel processing
- **Size-Based Queuing**: Smaller files processed first for faster feedback
- **Resource Management**: Automatic resource monitoring and optimization
- **Error Recovery**: Graceful error handling without stopping batch
### 🛠️ Enhanced Error Handling
- **User-Friendly Messages**: Clear, actionable error messages
- **Contextual Guidance**: Specific suggestions for common issues
- **Error Categories**: File, memory, GPU, permission, and generic errors
- **Recovery Suggestions**: Automatic recommendations for resolution
### 📁 Multiple Export Formats
- **JSON**: Structured data with metadata
- **TXT**: Plain text for readability
- **SRT**: SubRip subtitles for video players
- **VTT**: WebVTT subtitles for web applications
### 🔧 Advanced Features
- **Speaker Diarization**: Identify and separate speakers
- **Domain Adaptation**: Optimize for specific content types
- **Model Selection**: Choose from tiny to large models
- **Device Selection**: CPU or CUDA processing
## Installation & Setup
The Enhanced CLI is included with the standard Trax installation:
```bash
# Install dependencies
uv pip install -e ".[dev]"
# Verify installation
uv run python -m src.cli.enhanced_cli --help
```
## Command Reference
### Main CLI
```bash
uv run python -m src.cli.enhanced_cli [OPTIONS] COMMAND [ARGS]...
```
**Global Options:**
- `--help` - Show help message and exit
**Available Commands:**
- `transcribe` - Transcribe a single audio file
- `batch` - Process multiple files in batch
### Transcribe Command
```bash
uv run python -m src.cli.enhanced_cli transcribe [OPTIONS] INPUT
```
**Arguments:**
- `INPUT` - Input audio/video file path
**Options:**
- `-o, --output PATH` - Output directory (default: current directory)
- `-f, --format [json|txt|srt|vtt]` - Output format (default: json)
- `-m, --model [tiny|base|small|medium|large]` - Model size (default: base)
- `-d, --device [cpu|cuda]` - Processing device (default: cpu)
- `--domain [general|technical|medical|academic]` - Domain adaptation
- `--diarize` - Enable speaker diarization
- `--speakers INTEGER` - Number of speakers (for diarization)
- `--help` - Show help message and exit
**Examples:**
```bash
# Basic transcription
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3
# High-quality transcription with large model
uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large
# Academic content with domain adaptation
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical
# Speaker diarization with SRT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt
# GPU processing with VTT output
uv run python -m src.cli.enhanced_cli transcribe video.mp4 -d cuda -f vtt
```
### Batch Command
```bash
uv run python -m src.cli.enhanced_cli batch [OPTIONS] INPUT
```
**Arguments:**
- `INPUT` - Input directory containing audio/video files
**Options:**
- `-o, --output PATH` - Output directory (default: current directory)
- `-c, --concurrency INTEGER` - Number of concurrent processes (default: 4)
- `-f, --format [json|txt|srt|vtt]` - Output format (default: json)
- `-m, --model [tiny|base|small|medium|large]` - Model size (default: base)
- `-d, --device [cpu|cuda]` - Processing device (default: cpu)
- `--domain [general|technical|medical|academic]` - Domain adaptation
- `--diarize` - Enable speaker diarization
- `--speakers INTEGER` - Number of speakers (for diarization)
- `--help` - Show help message and exit
**Examples:**
```bash
# Batch process with 8 workers
uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8
# Academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic -m large
# Conservative processing for memory-constrained systems
uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small
# High-quality batch processing with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3
# GPU batch processing
uv run python -m src.cli.enhanced_cli batch ~/Videos -d cuda -c 4
```
## Performance Monitoring
### Real-Time Metrics
The Enhanced CLI displays live performance metrics during processing:
```bash
CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C
```
**Metrics Explained:**
- **CPU**: Current CPU utilization percentage
- **Memory**: Used memory / Total memory (percentage)
- **Temperature**: CPU temperature in Celsius (when available)
### Performance Guidelines
#### System Recommendations
- **Conservative**: 2-4 concurrent workers, small model
- **Balanced**: 4-6 concurrent workers, base model
- **Aggressive**: 6-8 concurrent workers, large model
#### Memory Usage
- **Small Model**: ~1GB per process
- **Base Model**: ~1.5GB per process
- **Large Model**: ~2GB per process
#### Processing Speed
- **v1 Pipeline**: <30 seconds for 5-minute audio
- **Real-time Factor**: <0.1 (much faster than real-time)
## Error Handling
### Error Categories
#### File Errors
```bash
❌ File not found: lecture.mp3
💡 Check that the input file path is correct and the file exists.
```
#### Memory Errors
```bash
❌ Memory error. Try using a smaller model with --model small or reduce concurrency.
```
#### GPU Errors
```bash
❌ CUDA out of memory
💡 GPU-related error. Try using --device cpu instead.
```
#### Permission Errors
```bash
❌ Permission denied: protected.wav
💡 Check file permissions or run with administrator privileges.
```
#### Generic Errors
```bash
❌ Invalid parameter
💡 Check input parameters and try again.
```
### Error Recovery
The Enhanced CLI provides specific guidance for each error type:
1. **File Issues**: Path validation and existence checks
2. **Memory Issues**: Model size and concurrency suggestions
3. **GPU Issues**: Device fallback recommendations
4. **Permission Issues**: File access guidance
5. **Generic Issues**: General troubleshooting tips
## Output Formats
### JSON Format (Default)
```json
{
"text_content": "Never gonna give you up...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Never gonna give you up"
}
],
"confidence": 0.95,
"processing_time": 5.2
}
```
### Text Format
```
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
...
```
### SRT Subtitles
```
1
00:00:00,000 --> 00:00:02,500
Never gonna give you up
2
00:00:02,500 --> 00:00:05,000
Never gonna let you down
```
### VTT Subtitles
```
WEBVTT
00:00:00.000 --> 00:00:02.500
Never gonna give you up
00:00:02.500 --> 00:00:05.000
Never gonna let you down
```
## Advanced Features
### Speaker Diarization
Identify and separate different speakers in audio:
```bash
# Enable diarization with 2 speakers
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2
# Batch processing with diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews --diarize --speakers 3
```
**Requirements:**
- pyannote.audio library installed
- HuggingFace token for speaker diarization models
### Domain Adaptation
Optimize transcription for specific content types:
```bash
# Medical content
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical
# Academic lectures
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --domain academic
# Technical content
uv run python -m src.cli.enhanced_cli transcribe tech_podcast.mp3 --domain technical
```
**Available Domains:**
- `general` - General purpose (default)
- `technical` - Technical and scientific content
- `medical` - Medical and healthcare content
- `academic` - Academic and educational content
## Common Workflows
### Research Workflow
```bash
# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt
# 2. Download selected videos
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download
# 3. Enhanced transcription with progress monitoring
uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic
# 4. Batch process with intelligent queuing
uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt
```
### Academic Lecture Processing
```bash
# Process academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures \
--domain academic \
-m large \
-f srt \
-c 4 \
--diarize \
--speakers 1
```
### Podcast Production
```bash
# High-quality podcast transcription with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Podcasts \
-m large \
-f vtt \
--diarize \
--speakers 3 \
-c 2
```
## Integration with Taskmaster
Track CLI operations using Taskmaster:
```bash
# Create task for batch processing
./scripts/tm_master.sh add "Process podcast archive with enhanced CLI"
# Track progress
./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining"
# Mark complete
./scripts/tm_master.sh done 15
```
## Troubleshooting
### Common Issues
#### Import Errors
```bash
ModuleNotFoundError: No module named 'pyannote'
```
**Solution**: Install optional dependencies for diarization
```bash
uv pip install pyannote.audio
```
#### Memory Issues
```bash
Memory error. Try using a smaller model with --model small or reduce concurrency.
```
**Solution**: Use smaller model or reduce concurrency
```bash
uv run python -m src.cli.enhanced_cli transcribe file.wav -m small -c 1
```
#### GPU Issues
```bash
CUDA out of memory
```
**Solution**: Switch to CPU processing
```bash
uv run python -m src.cli.enhanced_cli transcribe file.wav -d cpu
```
### Performance Optimization
#### For Memory-Constrained Systems
- Use `-m small` or `-m tiny` models
- Reduce concurrency with `-c 1` or `-c 2`
- Process smaller files first
#### For High-Performance Systems
- Use `-m large` models for best accuracy
- Increase concurrency with `-c 8` or higher
- Enable GPU processing with `-d cuda`
#### For Batch Processing
- Start with conservative settings
- Monitor system resources
- Adjust concurrency based on performance
## Development
### Architecture
The Enhanced CLI follows a modular, protocol-based architecture:
```python
class EnhancedCLI:
"""Main CLI with error handling and performance monitoring"""
class EnhancedTranscribeCommand:
"""Single file transcription with progress reporting"""
class EnhancedBatchCommand:
"""Batch processing with intelligent queuing"""
```
### Testing
Comprehensive test suite with 19 test cases:
```bash
# Run all enhanced CLI tests
uv run pytest tests/test_enhanced_cli.py -v
# Run specific test categories
uv run pytest tests/test_enhanced_cli.py::TestEnhancedCLI -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedTranscribeCommand -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedBatchCommand -v
```
### Code Quality
- **Lines of Code**: 483 lines
- **Test Coverage**: 100% pass rate
- **Type Hints**: Full type annotation
- **Error Handling**: Comprehensive error management
- **Documentation**: Inline documentation and examples
## Future Enhancements
### Planned Features
- **WebSocket Integration**: Real-time progress updates via WebSocket
- **Plugin System**: Extensible CLI with custom commands
- **Configuration Files**: Persistent settings and preferences
- **Advanced Metrics**: Detailed performance analytics
- **Cloud Integration**: Direct cloud storage support
### API Integration
- **REST API**: HTTP endpoints for programmatic access
- **GraphQL API**: Flexible query interface
- **Webhook Support**: Event-driven processing
- **SDK Development**: Client libraries for multiple languages
## Support
For issues and questions:
1. **Check Documentation**: Review this guide and other docs
2. **Run Tests**: Verify installation with test suite
3. **Check Logs**: Review error messages and system logs
4. **Community Support**: Use project issue tracker
5. **Performance Tuning**: Adjust settings based on system capabilities
---
*Enhanced CLI v1.0 - Comprehensive transcription interface with real-time progress reporting and performance monitoring.*