trax/docs/enhanced-cli.md

13 KiB

Enhanced CLI Documentation

Overview

The Enhanced CLI (src/cli/enhanced_cli.py) provides a modern, feature-rich interface for the Trax transcription platform with real-time progress reporting, performance monitoring, and advanced capabilities.

Key Features

🎯 Real-Time Progress Reporting

  • Rich Progress Bars: Beautiful progress bars with time estimates
  • Live Updates: Real-time transcription progress updates
  • Time Remaining: Accurate time-to-completion estimates
  • File Processing: Individual file progress in batch operations

📊 Performance Monitoring

  • CPU Usage: Real-time CPU utilization tracking
  • Memory Usage: Current and total memory monitoring
  • Temperature: CPU temperature monitoring (when available)
  • System Stats: Live system resource statistics

🚀 Intelligent Batch Processing

  • Concurrent Execution: Configurable parallel processing
  • Size-Based Queuing: Smaller files processed first for faster feedback
  • Resource Management: Automatic resource monitoring and optimization
  • Error Recovery: Graceful error handling without stopping batch

🛠️ Enhanced Error Handling

  • User-Friendly Messages: Clear, actionable error messages
  • Contextual Guidance: Specific suggestions for common issues
  • Error Categories: File, memory, GPU, permission, and generic errors
  • Recovery Suggestions: Automatic recommendations for resolution

📁 Multiple Export Formats

  • JSON: Structured data with metadata
  • TXT: Plain text for readability
  • SRT: SubRip subtitles for video players
  • VTT: WebVTT subtitles for web applications

🔧 Advanced Features

  • Speaker Diarization: Identify and separate speakers
  • Domain Adaptation: Optimize for specific content types
  • Model Selection: Choose from tiny to large models
  • Device Selection: CPU or CUDA processing

Installation & Setup

The Enhanced CLI is included with the standard Trax installation:

# Install dependencies
uv pip install -e ".[dev]"

# Verify installation
uv run python -m src.cli.enhanced_cli --help

Command Reference

Main CLI

uv run python -m src.cli.enhanced_cli [OPTIONS] COMMAND [ARGS]...

Global Options:

  • --help - Show help message and exit

Available Commands:

  • transcribe - Transcribe a single audio file
  • batch - Process multiple files in batch

Transcribe Command

uv run python -m src.cli.enhanced_cli transcribe [OPTIONS] INPUT

Arguments:

  • INPUT - Input audio/video file path

Options:

  • -o, --output PATH - Output directory (default: current directory)
  • -f, --format [json|txt|srt|vtt] - Output format (default: json)
  • -m, --model [tiny|base|small|medium|large] - Model size (default: base)
  • -d, --device [cpu|cuda] - Processing device (default: cpu)
  • --domain [general|technical|medical|academic] - Domain adaptation
  • --diarize - Enable speaker diarization
  • --speakers INTEGER - Number of speakers (for diarization)
  • --help - Show help message and exit

Examples:

# Basic transcription
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3

# High-quality transcription with large model
uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large

# Academic content with domain adaptation
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Speaker diarization with SRT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt

# GPU processing with VTT output
uv run python -m src.cli.enhanced_cli transcribe video.mp4 -d cuda -f vtt

Batch Command

uv run python -m src.cli.enhanced_cli batch [OPTIONS] INPUT

Arguments:

  • INPUT - Input directory containing audio/video files

Options:

  • -o, --output PATH - Output directory (default: current directory)
  • -c, --concurrency INTEGER - Number of concurrent processes (default: 4)
  • -f, --format [json|txt|srt|vtt] - Output format (default: json)
  • -m, --model [tiny|base|small|medium|large] - Model size (default: base)
  • -d, --device [cpu|cuda] - Processing device (default: cpu)
  • --domain [general|technical|medical|academic] - Domain adaptation
  • --diarize - Enable speaker diarization
  • --speakers INTEGER - Number of speakers (for diarization)
  • --help - Show help message and exit

Examples:

# Batch process with 8 workers
uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8

# Academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic -m large

# Conservative processing for memory-constrained systems
uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small

# High-quality batch processing with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3

# GPU batch processing
uv run python -m src.cli.enhanced_cli batch ~/Videos -d cuda -c 4

Performance Monitoring

Real-Time Metrics

The Enhanced CLI displays live performance metrics during processing:

CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C

Metrics Explained:

  • CPU: Current CPU utilization percentage
  • Memory: Used memory / Total memory (percentage)
  • Temperature: CPU temperature in Celsius (when available)

Performance Guidelines

System Recommendations

  • Conservative: 2-4 concurrent workers, small model
  • Balanced: 4-6 concurrent workers, base model
  • Aggressive: 6-8 concurrent workers, large model

Memory Usage

  • Small Model: ~1GB per process
  • Base Model: ~1.5GB per process
  • Large Model: ~2GB per process

Processing Speed

  • v1 Pipeline: <30 seconds for 5-minute audio
  • Real-time Factor: <0.1 (much faster than real-time)

Error Handling

Error Categories

File Errors

❌ File not found: lecture.mp3
💡 Check that the input file path is correct and the file exists.

Memory Errors

❌ Memory error. Try using a smaller model with --model small or reduce concurrency.

GPU Errors

❌ CUDA out of memory
💡 GPU-related error. Try using --device cpu instead.

Permission Errors

❌ Permission denied: protected.wav
💡 Check file permissions or run with administrator privileges.

Generic Errors

❌ Invalid parameter
💡 Check input parameters and try again.

Error Recovery

The Enhanced CLI provides specific guidance for each error type:

  1. File Issues: Path validation and existence checks
  2. Memory Issues: Model size and concurrency suggestions
  3. GPU Issues: Device fallback recommendations
  4. Permission Issues: File access guidance
  5. Generic Issues: General troubleshooting tips

Output Formats

JSON Format (Default)

{
  "text_content": "Never gonna give you up...",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Never gonna give you up"
    }
  ],
  "confidence": 0.95,
  "processing_time": 5.2
}

Text Format

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
...

SRT Subtitles

1
00:00:00,000 --> 00:00:02,500
Never gonna give you up

2
00:00:02,500 --> 00:00:05,000
Never gonna let you down

VTT Subtitles

WEBVTT

00:00:00.000 --> 00:00:02.500
Never gonna give you up

00:00:02.500 --> 00:00:05.000
Never gonna let you down

Advanced Features

Speaker Diarization

Identify and separate different speakers in audio:

# Enable diarization with 2 speakers
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2

# Batch processing with diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews --diarize --speakers 3

Requirements:

  • pyannote.audio library installed
  • HuggingFace token for speaker diarization models

Domain Adaptation

Optimize transcription for specific content types:

# Medical content
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Academic lectures
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --domain academic

# Technical content
uv run python -m src.cli.enhanced_cli transcribe tech_podcast.mp3 --domain technical

Available Domains:

  • general - General purpose (default)
  • technical - Technical and scientific content
  • medical - Medical and healthcare content
  • academic - Academic and educational content

Common Workflows

Research Workflow

# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt

# 2. Download selected videos
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download

# 3. Enhanced transcription with progress monitoring
uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic

# 4. Batch process with intelligent queuing
uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt

Academic Lecture Processing

# Process academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures \
  --domain academic \
  -m large \
  -f srt \
  -c 4 \
  --diarize \
  --speakers 1

Podcast Production

# High-quality podcast transcription with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Podcasts \
  -m large \
  -f vtt \
  --diarize \
  --speakers 3 \
  -c 2

Integration with Taskmaster

Track CLI operations using Taskmaster:

# Create task for batch processing
./scripts/tm_master.sh add "Process podcast archive with enhanced CLI"

# Track progress
./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining"

# Mark complete
./scripts/tm_master.sh done 15

Troubleshooting

Common Issues

Import Errors

ModuleNotFoundError: No module named 'pyannote'

Solution: Install optional dependencies for diarization

uv pip install pyannote.audio

Memory Issues

Memory error. Try using a smaller model with --model small or reduce concurrency.

Solution: Use smaller model or reduce concurrency

uv run python -m src.cli.enhanced_cli transcribe file.wav -m small -c 1

GPU Issues

CUDA out of memory

Solution: Switch to CPU processing

uv run python -m src.cli.enhanced_cli transcribe file.wav -d cpu

Performance Optimization

For Memory-Constrained Systems

  • Use -m small or -m tiny models
  • Reduce concurrency with -c 1 or -c 2
  • Process smaller files first

For High-Performance Systems

  • Use -m large models for best accuracy
  • Increase concurrency with -c 8 or higher
  • Enable GPU processing with -d cuda

For Batch Processing

  • Start with conservative settings
  • Monitor system resources
  • Adjust concurrency based on performance

Development

Architecture

The Enhanced CLI follows a modular, protocol-based architecture:

class EnhancedCLI:
    """Main CLI with error handling and performance monitoring"""
    
class EnhancedTranscribeCommand:
    """Single file transcription with progress reporting"""
    
class EnhancedBatchCommand:
    """Batch processing with intelligent queuing"""

Testing

Comprehensive test suite with 19 test cases:

# Run all enhanced CLI tests
uv run pytest tests/test_enhanced_cli.py -v

# Run specific test categories
uv run pytest tests/test_enhanced_cli.py::TestEnhancedCLI -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedTranscribeCommand -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedBatchCommand -v

Code Quality

  • Lines of Code: 483 lines
  • Test Coverage: 100% pass rate
  • Type Hints: Full type annotation
  • Error Handling: Comprehensive error management
  • Documentation: Inline documentation and examples

Future Enhancements

Planned Features

  • WebSocket Integration: Real-time progress updates via WebSocket
  • Plugin System: Extensible CLI with custom commands
  • Configuration Files: Persistent settings and preferences
  • Advanced Metrics: Detailed performance analytics
  • Cloud Integration: Direct cloud storage support

API Integration

  • REST API: HTTP endpoints for programmatic access
  • GraphQL API: Flexible query interface
  • Webhook Support: Event-driven processing
  • SDK Development: Client libraries for multiple languages

Support

For issues and questions:

  1. Check Documentation: Review this guide and other docs
  2. Run Tests: Verify installation with test suite
  3. Check Logs: Review error messages and system logs
  4. Community Support: Use project issue tracker
  5. Performance Tuning: Adjust settings based on system capabilities

Enhanced CLI v1.0 - Comprehensive transcription interface with real-time progress reporting and performance monitoring.