13 KiB
Enhanced CLI Documentation
Overview
The Enhanced CLI (src/cli/enhanced_cli.py) provides a modern, feature-rich interface for the Trax transcription platform with real-time progress reporting, performance monitoring, and advanced capabilities.
Key Features
🎯 Real-Time Progress Reporting
- Rich Progress Bars: Beautiful progress bars with time estimates
- Live Updates: Real-time transcription progress updates
- Time Remaining: Accurate time-to-completion estimates
- File Processing: Individual file progress in batch operations
📊 Performance Monitoring
- CPU Usage: Real-time CPU utilization tracking
- Memory Usage: Current and total memory monitoring
- Temperature: CPU temperature monitoring (when available)
- System Stats: Live system resource statistics
🚀 Intelligent Batch Processing
- Concurrent Execution: Configurable parallel processing
- Size-Based Queuing: Smaller files processed first for faster feedback
- Resource Management: Automatic resource monitoring and optimization
- Error Recovery: Graceful error handling without stopping batch
🛠️ Enhanced Error Handling
- User-Friendly Messages: Clear, actionable error messages
- Contextual Guidance: Specific suggestions for common issues
- Error Categories: File, memory, GPU, permission, and generic errors
- Recovery Suggestions: Automatic recommendations for resolution
📁 Multiple Export Formats
- JSON: Structured data with metadata
- TXT: Plain text for readability
- SRT: SubRip subtitles for video players
- VTT: WebVTT subtitles for web applications
🔧 Advanced Features
- Speaker Diarization: Identify and separate speakers
- Domain Adaptation: Optimize for specific content types
- Model Selection: Choose from tiny to large models
- Device Selection: CPU or CUDA processing
Installation & Setup
The Enhanced CLI is included with the standard Trax installation:
# Install dependencies
uv pip install -e ".[dev]"
# Verify installation
uv run python -m src.cli.enhanced_cli --help
Command Reference
Main CLI
uv run python -m src.cli.enhanced_cli [OPTIONS] COMMAND [ARGS]...
Global Options:
--help- Show help message and exit
Available Commands:
transcribe- Transcribe a single audio filebatch- Process multiple files in batch
Transcribe Command
uv run python -m src.cli.enhanced_cli transcribe [OPTIONS] INPUT
Arguments:
INPUT- Input audio/video file path
Options:
-o, --output PATH- Output directory (default: current directory)-f, --format [json|txt|srt|vtt]- Output format (default: json)-m, --model [tiny|base|small|medium|large]- Model size (default: base)-d, --device [cpu|cuda]- Processing device (default: cpu)--domain [general|technical|medical|academic]- Domain adaptation--diarize- Enable speaker diarization--speakers INTEGER- Number of speakers (for diarization)--help- Show help message and exit
Examples:
# Basic transcription
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3
# High-quality transcription with large model
uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large
# Academic content with domain adaptation
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical
# Speaker diarization with SRT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt
# GPU processing with VTT output
uv run python -m src.cli.enhanced_cli transcribe video.mp4 -d cuda -f vtt
Batch Command
uv run python -m src.cli.enhanced_cli batch [OPTIONS] INPUT
Arguments:
INPUT- Input directory containing audio/video files
Options:
-o, --output PATH- Output directory (default: current directory)-c, --concurrency INTEGER- Number of concurrent processes (default: 4)-f, --format [json|txt|srt|vtt]- Output format (default: json)-m, --model [tiny|base|small|medium|large]- Model size (default: base)-d, --device [cpu|cuda]- Processing device (default: cpu)--domain [general|technical|medical|academic]- Domain adaptation--diarize- Enable speaker diarization--speakers INTEGER- Number of speakers (for diarization)--help- Show help message and exit
Examples:
# Batch process with 8 workers
uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8
# Academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic -m large
# Conservative processing for memory-constrained systems
uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small
# High-quality batch processing with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3
# GPU batch processing
uv run python -m src.cli.enhanced_cli batch ~/Videos -d cuda -c 4
Performance Monitoring
Real-Time Metrics
The Enhanced CLI displays live performance metrics during processing:
CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C
Metrics Explained:
- CPU: Current CPU utilization percentage
- Memory: Used memory / Total memory (percentage)
- Temperature: CPU temperature in Celsius (when available)
Performance Guidelines
System Recommendations
- Conservative: 2-4 concurrent workers, small model
- Balanced: 4-6 concurrent workers, base model
- Aggressive: 6-8 concurrent workers, large model
Memory Usage
- Small Model: ~1GB per process
- Base Model: ~1.5GB per process
- Large Model: ~2GB per process
Processing Speed
- v1 Pipeline: <30 seconds for 5-minute audio
- Real-time Factor: <0.1 (much faster than real-time)
Error Handling
Error Categories
File Errors
❌ File not found: lecture.mp3
💡 Check that the input file path is correct and the file exists.
Memory Errors
❌ Memory error. Try using a smaller model with --model small or reduce concurrency.
GPU Errors
❌ CUDA out of memory
💡 GPU-related error. Try using --device cpu instead.
Permission Errors
❌ Permission denied: protected.wav
💡 Check file permissions or run with administrator privileges.
Generic Errors
❌ Invalid parameter
💡 Check input parameters and try again.
Error Recovery
The Enhanced CLI provides specific guidance for each error type:
- File Issues: Path validation and existence checks
- Memory Issues: Model size and concurrency suggestions
- GPU Issues: Device fallback recommendations
- Permission Issues: File access guidance
- Generic Issues: General troubleshooting tips
Output Formats
JSON Format (Default)
{
"text_content": "Never gonna give you up...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Never gonna give you up"
}
],
"confidence": 0.95,
"processing_time": 5.2
}
Text Format
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
...
SRT Subtitles
1
00:00:00,000 --> 00:00:02,500
Never gonna give you up
2
00:00:02,500 --> 00:00:05,000
Never gonna let you down
VTT Subtitles
WEBVTT
00:00:00.000 --> 00:00:02.500
Never gonna give you up
00:00:02.500 --> 00:00:05.000
Never gonna let you down
Advanced Features
Speaker Diarization
Identify and separate different speakers in audio:
# Enable diarization with 2 speakers
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2
# Batch processing with diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews --diarize --speakers 3
Requirements:
- pyannote.audio library installed
- HuggingFace token for speaker diarization models
Domain Adaptation
Optimize transcription for specific content types:
# Medical content
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical
# Academic lectures
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --domain academic
# Technical content
uv run python -m src.cli.enhanced_cli transcribe tech_podcast.mp3 --domain technical
Available Domains:
general- General purpose (default)technical- Technical and scientific contentmedical- Medical and healthcare contentacademic- Academic and educational content
Common Workflows
Research Workflow
# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt
# 2. Download selected videos
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download
# 3. Enhanced transcription with progress monitoring
uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic
# 4. Batch process with intelligent queuing
uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt
Academic Lecture Processing
# Process academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures \
--domain academic \
-m large \
-f srt \
-c 4 \
--diarize \
--speakers 1
Podcast Production
# High-quality podcast transcription with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Podcasts \
-m large \
-f vtt \
--diarize \
--speakers 3 \
-c 2
Integration with Taskmaster
Track CLI operations using Taskmaster:
# Create task for batch processing
./scripts/tm_master.sh add "Process podcast archive with enhanced CLI"
# Track progress
./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining"
# Mark complete
./scripts/tm_master.sh done 15
Troubleshooting
Common Issues
Import Errors
ModuleNotFoundError: No module named 'pyannote'
Solution: Install optional dependencies for diarization
uv pip install pyannote.audio
Memory Issues
Memory error. Try using a smaller model with --model small or reduce concurrency.
Solution: Use smaller model or reduce concurrency
uv run python -m src.cli.enhanced_cli transcribe file.wav -m small -c 1
GPU Issues
CUDA out of memory
Solution: Switch to CPU processing
uv run python -m src.cli.enhanced_cli transcribe file.wav -d cpu
Performance Optimization
For Memory-Constrained Systems
- Use
-m smallor-m tinymodels - Reduce concurrency with
-c 1or-c 2 - Process smaller files first
For High-Performance Systems
- Use
-m largemodels for best accuracy - Increase concurrency with
-c 8or higher - Enable GPU processing with
-d cuda
For Batch Processing
- Start with conservative settings
- Monitor system resources
- Adjust concurrency based on performance
Development
Architecture
The Enhanced CLI follows a modular, protocol-based architecture:
class EnhancedCLI:
"""Main CLI with error handling and performance monitoring"""
class EnhancedTranscribeCommand:
"""Single file transcription with progress reporting"""
class EnhancedBatchCommand:
"""Batch processing with intelligent queuing"""
Testing
Comprehensive test suite with 19 test cases:
# Run all enhanced CLI tests
uv run pytest tests/test_enhanced_cli.py -v
# Run specific test categories
uv run pytest tests/test_enhanced_cli.py::TestEnhancedCLI -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedTranscribeCommand -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedBatchCommand -v
Code Quality
- Lines of Code: 483 lines
- Test Coverage: 100% pass rate
- Type Hints: Full type annotation
- Error Handling: Comprehensive error management
- Documentation: Inline documentation and examples
Future Enhancements
Planned Features
- WebSocket Integration: Real-time progress updates via WebSocket
- Plugin System: Extensible CLI with custom commands
- Configuration Files: Persistent settings and preferences
- Advanced Metrics: Detailed performance analytics
- Cloud Integration: Direct cloud storage support
API Integration
- REST API: HTTP endpoints for programmatic access
- GraphQL API: Flexible query interface
- Webhook Support: Event-driven processing
- SDK Development: Client libraries for multiple languages
Support
For issues and questions:
- Check Documentation: Review this guide and other docs
- Run Tests: Verify installation with test suite
- Check Logs: Review error messages and system logs
- Community Support: Use project issue tracker
- Performance Tuning: Adjust settings based on system capabilities
Enhanced CLI v1.0 - Comprehensive transcription interface with real-time progress reporting and performance monitoring.