13 KiB

Raw Blame History

Enhanced CLI Documentation

Overview

The Enhanced CLI (src/cli/enhanced_cli.py) provides a modern, feature-rich interface for the Trax transcription platform with real-time progress reporting, performance monitoring, and advanced capabilities.

Key Features

🎯 Real-Time Progress Reporting

Rich Progress Bars: Beautiful progress bars with time estimates
Live Updates: Real-time transcription progress updates
Time Remaining: Accurate time-to-completion estimates
File Processing: Individual file progress in batch operations

📊 Performance Monitoring

CPU Usage: Real-time CPU utilization tracking
Memory Usage: Current and total memory monitoring
Temperature: CPU temperature monitoring (when available)
System Stats: Live system resource statistics

🚀 Intelligent Batch Processing

Concurrent Execution: Configurable parallel processing
Size-Based Queuing: Smaller files processed first for faster feedback
Resource Management: Automatic resource monitoring and optimization
Error Recovery: Graceful error handling without stopping batch

🛠️ Enhanced Error Handling

User-Friendly Messages: Clear, actionable error messages
Contextual Guidance: Specific suggestions for common issues
Error Categories: File, memory, GPU, permission, and generic errors
Recovery Suggestions: Automatic recommendations for resolution

📁 Multiple Export Formats

JSON: Structured data with metadata
TXT: Plain text for readability
SRT: SubRip subtitles for video players
VTT: WebVTT subtitles for web applications

🔧 Advanced Features

Speaker Diarization: Identify and separate speakers
Domain Adaptation: Optimize for specific content types
Model Selection: Choose from tiny to large models
Device Selection: CPU or CUDA processing

Installation & Setup

The Enhanced CLI is included with the standard Trax installation:

# Install dependencies
uv pip install -e ".[dev]"

# Verify installation
uv run python -m src.cli.enhanced_cli --help

Command Reference

Main CLI

uv run python -m src.cli.enhanced_cli [OPTIONS] COMMAND [ARGS]...

Global Options:

--help - Show help message and exit

Available Commands:

transcribe - Transcribe a single audio file
batch - Process multiple files in batch

Transcribe Command

uv run python -m src.cli.enhanced_cli transcribe [OPTIONS] INPUT

Arguments:

INPUT - Input audio/video file path

Options:

-o, --output PATH - Output directory (default: current directory)
-f, --format [json|txt|srt|vtt] - Output format (default: json)
-m, --model [tiny|base|small|medium|large] - Model size (default: base)
-d, --device [cpu|cuda] - Processing device (default: cpu)
--domain [general|technical|medical|academic] - Domain adaptation
--diarize - Enable speaker diarization
--speakers INTEGER - Number of speakers (for diarization)
--help - Show help message and exit

Examples:

# Basic transcription
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3

# High-quality transcription with large model
uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large

# Academic content with domain adaptation
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Speaker diarization with SRT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt

# GPU processing with VTT output
uv run python -m src.cli.enhanced_cli transcribe video.mp4 -d cuda -f vtt

Batch Command

uv run python -m src.cli.enhanced_cli batch [OPTIONS] INPUT

Arguments:

INPUT - Input directory containing audio/video files

Options:

-o, --output PATH - Output directory (default: current directory)
-c, --concurrency INTEGER - Number of concurrent processes (default: 4)
-f, --format [json|txt|srt|vtt] - Output format (default: json)
-m, --model [tiny|base|small|medium|large] - Model size (default: base)
-d, --device [cpu|cuda] - Processing device (default: cpu)
--domain [general|technical|medical|academic] - Domain adaptation
--diarize - Enable speaker diarization
--speakers INTEGER - Number of speakers (for diarization)
--help - Show help message and exit

Examples:

# Batch process with 8 workers
uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8

# Academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic -m large

# Conservative processing for memory-constrained systems
uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small

# High-quality batch processing with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3

# GPU batch processing
uv run python -m src.cli.enhanced_cli batch ~/Videos -d cuda -c 4

Performance Monitoring

Real-Time Metrics

The Enhanced CLI displays live performance metrics during processing:

CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C

Metrics Explained:

CPU: Current CPU utilization percentage
Memory: Used memory / Total memory (percentage)
Temperature: CPU temperature in Celsius (when available)

Performance Guidelines

System Recommendations

Conservative: 2-4 concurrent workers, small model
Balanced: 4-6 concurrent workers, base model
Aggressive: 6-8 concurrent workers, large model

Memory Usage

Small Model: ~1GB per process
Base Model: ~1.5GB per process
Large Model: ~2GB per process

Processing Speed

v1 Pipeline: <30 seconds for 5-minute audio
Real-time Factor: <0.1 (much faster than real-time)

Error Handling

Error Categories

File Errors

❌ File not found: lecture.mp3
💡 Check that the input file path is correct and the file exists.

Memory Errors

❌ Memory error. Try using a smaller model with --model small or reduce concurrency.

GPU Errors

❌ CUDA out of memory
💡 GPU-related error. Try using --device cpu instead.

Permission Errors

❌ Permission denied: protected.wav
💡 Check file permissions or run with administrator privileges.

Generic Errors

❌ Invalid parameter
💡 Check input parameters and try again.

Error Recovery

The Enhanced CLI provides specific guidance for each error type:

File Issues: Path validation and existence checks
Memory Issues: Model size and concurrency suggestions
GPU Issues: Device fallback recommendations
Permission Issues: File access guidance
Generic Issues: General troubleshooting tips

Output Formats

JSON Format (Default)

{
  "text_content": "Never gonna give you up...",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Never gonna give you up"
    }
  ],
  "confidence": 0.95,
  "processing_time": 5.2
}

Text Format

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
...

SRT Subtitles

1
00:00:00,000 --> 00:00:02,500
Never gonna give you up

2
00:00:02,500 --> 00:00:05,000
Never gonna let you down

VTT Subtitles

WEBVTT

00:00:00.000 --> 00:00:02.500
Never gonna give you up

00:00:02.500 --> 00:00:05.000
Never gonna let you down

Advanced Features

Speaker Diarization

Identify and separate different speakers in audio:

# Enable diarization with 2 speakers
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2

# Batch processing with diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews --diarize --speakers 3

Requirements:

pyannote.audio library installed
HuggingFace token for speaker diarization models

Domain Adaptation

Optimize transcription for specific content types:

# Medical content
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Academic lectures
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --domain academic

# Technical content
uv run python -m src.cli.enhanced_cli transcribe tech_podcast.mp3 --domain technical

Available Domains:

general - General purpose (default)
technical - Technical and scientific content
medical - Medical and healthcare content
academic - Academic and educational content

Common Workflows

Research Workflow

# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt

# 2. Download selected videos
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download

# 3. Enhanced transcription with progress monitoring
uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic

# 4. Batch process with intelligent queuing
uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt

Academic Lecture Processing

# Process academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures \
  --domain academic \
  -m large \
  -f srt \
  -c 4 \
  --diarize \
  --speakers 1

Podcast Production

# High-quality podcast transcription with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Podcasts \
  -m large \
  -f vtt \
  --diarize \
  --speakers 3 \
  -c 2

Integration with Taskmaster

Track CLI operations using Taskmaster:

# Create task for batch processing
./scripts/tm_master.sh add "Process podcast archive with enhanced CLI"

# Track progress
./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining"

# Mark complete
./scripts/tm_master.sh done 15

Troubleshooting

Common Issues

Import Errors

ModuleNotFoundError: No module named 'pyannote'

Solution: Install optional dependencies for diarization

uv pip install pyannote.audio

Memory Issues

Memory error. Try using a smaller model with --model small or reduce concurrency.

Solution: Use smaller model or reduce concurrency

uv run python -m src.cli.enhanced_cli transcribe file.wav -m small -c 1

GPU Issues

CUDA out of memory

Solution: Switch to CPU processing

uv run python -m src.cli.enhanced_cli transcribe file.wav -d cpu

Performance Optimization

For Memory-Constrained Systems

Use -m small or -m tiny models
Reduce concurrency with -c 1 or -c 2
Process smaller files first

For High-Performance Systems

Use -m large models for best accuracy
Increase concurrency with -c 8 or higher
Enable GPU processing with -d cuda

For Batch Processing

Start with conservative settings
Monitor system resources
Adjust concurrency based on performance

Development

Architecture

The Enhanced CLI follows a modular, protocol-based architecture:

class EnhancedCLI:
    """Main CLI with error handling and performance monitoring"""
    
class EnhancedTranscribeCommand:
    """Single file transcription with progress reporting"""
    
class EnhancedBatchCommand:
    """Batch processing with intelligent queuing"""

Testing

Comprehensive test suite with 19 test cases:

# Run all enhanced CLI tests
uv run pytest tests/test_enhanced_cli.py -v

# Run specific test categories
uv run pytest tests/test_enhanced_cli.py::TestEnhancedCLI -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedTranscribeCommand -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedBatchCommand -v

Code Quality

Lines of Code: 483 lines
Test Coverage: 100% pass rate
Type Hints: Full type annotation
Error Handling: Comprehensive error management
Documentation: Inline documentation and examples

Future Enhancements

Planned Features

WebSocket Integration: Real-time progress updates via WebSocket
Plugin System: Extensible CLI with custom commands
Configuration Files: Persistent settings and preferences
Advanced Metrics: Detailed performance analytics
Cloud Integration: Direct cloud storage support

API Integration

REST API: HTTP endpoints for programmatic access
GraphQL API: Flexible query interface
Webhook Support: Event-driven processing
SDK Development: Client libraries for multiple languages

Support

For issues and questions:

Check Documentation: Review this guide and other docs
Run Tests: Verify installation with test suite
Check Logs: Review error messages and system logs
Community Support: Use project issue tracker
Performance Tuning: Adjust settings based on system capabilities

Enhanced CLI v1.0 - Comprehensive transcription interface with real-time progress reporting and performance monitoring.

13 KiB Raw Blame History