# Enhanced CLI Documentation ## Overview The Enhanced CLI (`src/cli/enhanced_cli.py`) provides a modern, feature-rich interface for the Trax transcription platform with real-time progress reporting, performance monitoring, and advanced capabilities. ## Key Features ### 🎯 Real-Time Progress Reporting - **Rich Progress Bars**: Beautiful progress bars with time estimates - **Live Updates**: Real-time transcription progress updates - **Time Remaining**: Accurate time-to-completion estimates - **File Processing**: Individual file progress in batch operations ### 📊 Performance Monitoring - **CPU Usage**: Real-time CPU utilization tracking - **Memory Usage**: Current and total memory monitoring - **Temperature**: CPU temperature monitoring (when available) - **System Stats**: Live system resource statistics ### 🚀 Intelligent Batch Processing - **Concurrent Execution**: Configurable parallel processing - **Size-Based Queuing**: Smaller files processed first for faster feedback - **Resource Management**: Automatic resource monitoring and optimization - **Error Recovery**: Graceful error handling without stopping batch ### 🛠️ Enhanced Error Handling - **User-Friendly Messages**: Clear, actionable error messages - **Contextual Guidance**: Specific suggestions for common issues - **Error Categories**: File, memory, GPU, permission, and generic errors - **Recovery Suggestions**: Automatic recommendations for resolution ### 📁 Multiple Export Formats - **JSON**: Structured data with metadata - **TXT**: Plain text for readability - **SRT**: SubRip subtitles for video players - **VTT**: WebVTT subtitles for web applications ### 🔧 Advanced Features - **Speaker Diarization**: Identify and separate speakers - **Domain Adaptation**: Optimize for specific content types - **Model Selection**: Choose from tiny to large models - **Device Selection**: CPU or CUDA processing ## Installation & Setup The Enhanced CLI is included with the standard Trax installation: ```bash # Install dependencies uv pip install -e ".[dev]" # Verify installation uv run python -m src.cli.enhanced_cli --help ``` ## Command Reference ### Main CLI ```bash uv run python -m src.cli.enhanced_cli [OPTIONS] COMMAND [ARGS]... ``` **Global Options:** - `--help` - Show help message and exit **Available Commands:** - `transcribe` - Transcribe a single audio file - `batch` - Process multiple files in batch ### Transcribe Command ```bash uv run python -m src.cli.enhanced_cli transcribe [OPTIONS] INPUT ``` **Arguments:** - `INPUT` - Input audio/video file path **Options:** - `-o, --output PATH` - Output directory (default: current directory) - `-f, --format [json|txt|srt|vtt]` - Output format (default: json) - `-m, --model [tiny|base|small|medium|large]` - Model size (default: base) - `-d, --device [cpu|cuda]` - Processing device (default: cpu) - `--domain [general|technical|medical|academic]` - Domain adaptation - `--diarize` - Enable speaker diarization - `--speakers INTEGER` - Number of speakers (for diarization) - `--help` - Show help message and exit **Examples:** ```bash # Basic transcription uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 # High-quality transcription with large model uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large # Academic content with domain adaptation uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical # Speaker diarization with SRT output uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt # GPU processing with VTT output uv run python -m src.cli.enhanced_cli transcribe video.mp4 -d cuda -f vtt ``` ### Batch Command ```bash uv run python -m src.cli.enhanced_cli batch [OPTIONS] INPUT ``` **Arguments:** - `INPUT` - Input directory containing audio/video files **Options:** - `-o, --output PATH` - Output directory (default: current directory) - `-c, --concurrency INTEGER` - Number of concurrent processes (default: 4) - `-f, --format [json|txt|srt|vtt]` - Output format (default: json) - `-m, --model [tiny|base|small|medium|large]` - Model size (default: base) - `-d, --device [cpu|cuda]` - Processing device (default: cpu) - `--domain [general|technical|medical|academic]` - Domain adaptation - `--diarize` - Enable speaker diarization - `--speakers INTEGER` - Number of speakers (for diarization) - `--help` - Show help message and exit **Examples:** ```bash # Batch process with 8 workers uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8 # Academic lectures with domain adaptation uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic -m large # Conservative processing for memory-constrained systems uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small # High-quality batch processing with speaker diarization uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3 # GPU batch processing uv run python -m src.cli.enhanced_cli batch ~/Videos -d cuda -c 4 ``` ## Performance Monitoring ### Real-Time Metrics The Enhanced CLI displays live performance metrics during processing: ```bash CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C ``` **Metrics Explained:** - **CPU**: Current CPU utilization percentage - **Memory**: Used memory / Total memory (percentage) - **Temperature**: CPU temperature in Celsius (when available) ### Performance Guidelines #### System Recommendations - **Conservative**: 2-4 concurrent workers, small model - **Balanced**: 4-6 concurrent workers, base model - **Aggressive**: 6-8 concurrent workers, large model #### Memory Usage - **Small Model**: ~1GB per process - **Base Model**: ~1.5GB per process - **Large Model**: ~2GB per process #### Processing Speed - **v1 Pipeline**: <30 seconds for 5-minute audio - **Real-time Factor**: <0.1 (much faster than real-time) ## Error Handling ### Error Categories #### File Errors ```bash ❌ File not found: lecture.mp3 💡 Check that the input file path is correct and the file exists. ``` #### Memory Errors ```bash ❌ Memory error. Try using a smaller model with --model small or reduce concurrency. ``` #### GPU Errors ```bash ❌ CUDA out of memory 💡 GPU-related error. Try using --device cpu instead. ``` #### Permission Errors ```bash ❌ Permission denied: protected.wav 💡 Check file permissions or run with administrator privileges. ``` #### Generic Errors ```bash ❌ Invalid parameter 💡 Check input parameters and try again. ``` ### Error Recovery The Enhanced CLI provides specific guidance for each error type: 1. **File Issues**: Path validation and existence checks 2. **Memory Issues**: Model size and concurrency suggestions 3. **GPU Issues**: Device fallback recommendations 4. **Permission Issues**: File access guidance 5. **Generic Issues**: General troubleshooting tips ## Output Formats ### JSON Format (Default) ```json { "text_content": "Never gonna give you up...", "segments": [ { "start": 0.0, "end": 2.5, "text": "Never gonna give you up" } ], "confidence": 0.95, "processing_time": 5.2 } ``` ### Text Format ``` Never gonna give you up Never gonna let you down Never gonna run around and desert you ... ``` ### SRT Subtitles ``` 1 00:00:00,000 --> 00:00:02,500 Never gonna give you up 2 00:00:02,500 --> 00:00:05,000 Never gonna let you down ``` ### VTT Subtitles ``` WEBVTT 00:00:00.000 --> 00:00:02.500 Never gonna give you up 00:00:02.500 --> 00:00:05.000 Never gonna let you down ``` ## Advanced Features ### Speaker Diarization Identify and separate different speakers in audio: ```bash # Enable diarization with 2 speakers uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 # Batch processing with diarization uv run python -m src.cli.enhanced_cli batch ~/Interviews --diarize --speakers 3 ``` **Requirements:** - pyannote.audio library installed - HuggingFace token for speaker diarization models ### Domain Adaptation Optimize transcription for specific content types: ```bash # Medical content uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical # Academic lectures uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --domain academic # Technical content uv run python -m src.cli.enhanced_cli transcribe tech_podcast.mp3 --domain technical ``` **Available Domains:** - `general` - General purpose (default) - `technical` - Technical and scientific content - `medical` - Medical and healthcare content - `academic` - Academic and educational content ## Common Workflows ### Research Workflow ```bash # 1. Extract metadata from YouTube playlist uv run python -m src.cli.main batch-urls research_videos.txt # 2. Download selected videos uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download # 3. Enhanced transcription with progress monitoring uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic # 4. Batch process with intelligent queuing uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt ``` ### Academic Lecture Processing ```bash # Process academic lectures with domain adaptation uv run python -m src.cli.enhanced_cli batch ~/Lectures \ --domain academic \ -m large \ -f srt \ -c 4 \ --diarize \ --speakers 1 ``` ### Podcast Production ```bash # High-quality podcast transcription with speaker diarization uv run python -m src.cli.enhanced_cli batch ~/Podcasts \ -m large \ -f vtt \ --diarize \ --speakers 3 \ -c 2 ``` ## Integration with Taskmaster Track CLI operations using Taskmaster: ```bash # Create task for batch processing ./scripts/tm_master.sh add "Process podcast archive with enhanced CLI" # Track progress ./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining" # Mark complete ./scripts/tm_master.sh done 15 ``` ## Troubleshooting ### Common Issues #### Import Errors ```bash ModuleNotFoundError: No module named 'pyannote' ``` **Solution**: Install optional dependencies for diarization ```bash uv pip install pyannote.audio ``` #### Memory Issues ```bash Memory error. Try using a smaller model with --model small or reduce concurrency. ``` **Solution**: Use smaller model or reduce concurrency ```bash uv run python -m src.cli.enhanced_cli transcribe file.wav -m small -c 1 ``` #### GPU Issues ```bash CUDA out of memory ``` **Solution**: Switch to CPU processing ```bash uv run python -m src.cli.enhanced_cli transcribe file.wav -d cpu ``` ### Performance Optimization #### For Memory-Constrained Systems - Use `-m small` or `-m tiny` models - Reduce concurrency with `-c 1` or `-c 2` - Process smaller files first #### For High-Performance Systems - Use `-m large` models for best accuracy - Increase concurrency with `-c 8` or higher - Enable GPU processing with `-d cuda` #### For Batch Processing - Start with conservative settings - Monitor system resources - Adjust concurrency based on performance ## Development ### Architecture The Enhanced CLI follows a modular, protocol-based architecture: ```python class EnhancedCLI: """Main CLI with error handling and performance monitoring""" class EnhancedTranscribeCommand: """Single file transcription with progress reporting""" class EnhancedBatchCommand: """Batch processing with intelligent queuing""" ``` ### Testing Comprehensive test suite with 19 test cases: ```bash # Run all enhanced CLI tests uv run pytest tests/test_enhanced_cli.py -v # Run specific test categories uv run pytest tests/test_enhanced_cli.py::TestEnhancedCLI -v uv run pytest tests/test_enhanced_cli.py::TestEnhancedTranscribeCommand -v uv run pytest tests/test_enhanced_cli.py::TestEnhancedBatchCommand -v ``` ### Code Quality - **Lines of Code**: 483 lines - **Test Coverage**: 100% pass rate - **Type Hints**: Full type annotation - **Error Handling**: Comprehensive error management - **Documentation**: Inline documentation and examples ## Future Enhancements ### Planned Features - **WebSocket Integration**: Real-time progress updates via WebSocket - **Plugin System**: Extensible CLI with custom commands - **Configuration Files**: Persistent settings and preferences - **Advanced Metrics**: Detailed performance analytics - **Cloud Integration**: Direct cloud storage support ### API Integration - **REST API**: HTTP endpoints for programmatic access - **GraphQL API**: Flexible query interface - **Webhook Support**: Event-driven processing - **SDK Development**: Client libraries for multiple languages ## Support For issues and questions: 1. **Check Documentation**: Review this guide and other docs 2. **Run Tests**: Verify installation with test suite 3. **Check Logs**: Review error messages and system logs 4. **Community Support**: Use project issue tracker 5. **Performance Tuning**: Adjust settings based on system capabilities --- *Enhanced CLI v1.0 - Comprehensive transcription interface with real-time progress reporting and performance monitoring.*