trax/docs/enhanced-cli.md

# Enhanced CLI Documentation

## Overview

The Enhanced CLI (`src/cli/enhanced_cli.py`) provides a modern, feature-rich interface for the Trax transcription platform with real-time progress reporting, performance monitoring, and advanced capabilities.

## Key Features

### 🎯 Real-Time Progress Reporting
- **Rich Progress Bars**: Beautiful progress bars with time estimates
- **Live Updates**: Real-time transcription progress updates
- **Time Remaining**: Accurate time-to-completion estimates
- **File Processing**: Individual file progress in batch operations

### 📊 Performance Monitoring
- **CPU Usage**: Real-time CPU utilization tracking
- **Memory Usage**: Current and total memory monitoring
- **Temperature**: CPU temperature monitoring (when available)
- **System Stats**: Live system resource statistics

### 🚀 Intelligent Batch Processing
- **Concurrent Execution**: Configurable parallel processing
- **Size-Based Queuing**: Smaller files processed first for faster feedback
- **Resource Management**: Automatic resource monitoring and optimization
- **Error Recovery**: Graceful error handling without stopping batch

### 🛠️ Enhanced Error Handling
- **User-Friendly Messages**: Clear, actionable error messages
- **Contextual Guidance**: Specific suggestions for common issues
- **Error Categories**: File, memory, GPU, permission, and generic errors
- **Recovery Suggestions**: Automatic recommendations for resolution

### 📁 Multiple Export Formats
- **JSON**: Structured data with metadata
- **TXT**: Plain text for readability
- **SRT**: SubRip subtitles for video players
- **VTT**: WebVTT subtitles for web applications

### 🔧 Advanced Features
- **Speaker Diarization**: Identify and separate speakers
- **Domain Adaptation**: Optimize for specific content types
- **Model Selection**: Choose from tiny to large models
- **Device Selection**: CPU or CUDA processing

## Installation & Setup

The Enhanced CLI is included with the standard Trax installation:

```bash
# Install dependencies
uv pip install -e ".[dev]"

# Verify installation
uv run python -m src.cli.enhanced_cli --help
```

## Command Reference

### Main CLI

```bash
uv run python -m src.cli.enhanced_cli [OPTIONS] COMMAND [ARGS]...
```

**Global Options:**
- `--help` - Show help message and exit

**Available Commands:**
- `transcribe` - Transcribe a single audio file
- `batch` - Process multiple files in batch

### Transcribe Command

```bash
uv run python -m src.cli.enhanced_cli transcribe [OPTIONS] INPUT
```

**Arguments:**
- `INPUT` - Input audio/video file path

**Options:**
- `-o, --output PATH` - Output directory (default: current directory)
- `-f, --format [json|txt|srt|vtt]` - Output format (default: json)
- `-m, --model [tiny|base|small|medium|large]` - Model size (default: base)
- `-d, --device [cpu|cuda]` - Processing device (default: cpu)
- `--domain [general|technical|medical|academic]` - Domain adaptation
- `--diarize` - Enable speaker diarization
- `--speakers INTEGER` - Number of speakers (for diarization)
- `--help` - Show help message and exit

**Examples:**
```bash
# Basic transcription
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3

# High-quality transcription with large model
uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large

# Academic content with domain adaptation
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Speaker diarization with SRT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt

# GPU processing with VTT output
uv run python -m src.cli.enhanced_cli transcribe video.mp4 -d cuda -f vtt
```

### Batch Command

```bash
uv run python -m src.cli.enhanced_cli batch [OPTIONS] INPUT
```

**Arguments:**
- `INPUT` - Input directory containing audio/video files

**Options:**
- `-o, --output PATH` - Output directory (default: current directory)
- `-c, --concurrency INTEGER` - Number of concurrent processes (default: 4)
- `-f, --format [json|txt|srt|vtt]` - Output format (default: json)
- `-m, --model [tiny|base|small|medium|large]` - Model size (default: base)
- `-d, --device [cpu|cuda]` - Processing device (default: cpu)
- `--domain [general|technical|medical|academic]` - Domain adaptation
- `--diarize` - Enable speaker diarization
- `--speakers INTEGER` - Number of speakers (for diarization)
- `--help` - Show help message and exit

**Examples:**
```bash
# Batch process with 8 workers
uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8

# Academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic -m large

# Conservative processing for memory-constrained systems
uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small

# High-quality batch processing with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3

# GPU batch processing
uv run python -m src.cli.enhanced_cli batch ~/Videos -d cuda -c 4
```

## Performance Monitoring

### Real-Time Metrics

The Enhanced CLI displays live performance metrics during processing:

```bash
CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C
```

**Metrics Explained:**
- **CPU**: Current CPU utilization percentage
- **Memory**: Used memory / Total memory (percentage)
- **Temperature**: CPU temperature in Celsius (when available)

### Performance Guidelines

#### System Recommendations
- **Conservative**: 2-4 concurrent workers, small model
- **Balanced**: 4-6 concurrent workers, base model
- **Aggressive**: 6-8 concurrent workers, large model

#### Memory Usage
- **Small Model**: ~1GB per process
- **Base Model**: ~1.5GB per process
- **Large Model**: ~2GB per process

#### Processing Speed
- **v1 Pipeline**: <30 seconds for 5-minute audio
- **Real-time Factor**: <0.1 (much faster than real-time)

## Error Handling

### Error Categories

#### File Errors
```bash
❌ File not found: lecture.mp3
💡 Check that the input file path is correct and the file exists.
```

#### Memory Errors
```bash
❌ Memory error. Try using a smaller model with --model small or reduce concurrency.
```

#### GPU Errors
```bash
❌ CUDA out of memory
💡 GPU-related error. Try using --device cpu instead.
```

#### Permission Errors
```bash
❌ Permission denied: protected.wav
💡 Check file permissions or run with administrator privileges.
```

#### Generic Errors
```bash
❌ Invalid parameter
💡 Check input parameters and try again.
```

### Error Recovery

The Enhanced CLI provides specific guidance for each error type:

1. **File Issues**: Path validation and existence checks
2. **Memory Issues**: Model size and concurrency suggestions
3. **GPU Issues**: Device fallback recommendations
4. **Permission Issues**: File access guidance
5. **Generic Issues**: General troubleshooting tips

## Output Formats

### JSON Format (Default)
```json
{
  "text_content": "Never gonna give you up...",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Never gonna give you up"
    }
  ],
  "confidence": 0.95,
  "processing_time": 5.2
}
```

### Text Format
```
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
...
```

### SRT Subtitles
```
1
00:00:00,000 --> 00:00:02,500
Never gonna give you up

2
00:00:02,500 --> 00:00:05,000
Never gonna let you down
```

### VTT Subtitles
```
WEBVTT

00:00:00.000 --> 00:00:02.500
Never gonna give you up

00:00:02.500 --> 00:00:05.000
Never gonna let you down
```

## Advanced Features

### Speaker Diarization

Identify and separate different speakers in audio:

```bash
# Enable diarization with 2 speakers
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2

# Batch processing with diarization
uv run python -m src.cli.enhanced_cli batch ~/Interviews --diarize --speakers 3
```

**Requirements:**
- pyannote.audio library installed
- HuggingFace token for speaker diarization models

### Domain Adaptation

Optimize transcription for specific content types:

```bash
# Medical content
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Academic lectures
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --domain academic

# Technical content
uv run python -m src.cli.enhanced_cli transcribe tech_podcast.mp3 --domain technical
```

**Available Domains:**
- `general` - General purpose (default)
- `technical` - Technical and scientific content
- `medical` - Medical and healthcare content
- `academic` - Academic and educational content

## Common Workflows

### Research Workflow
```bash
# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt

# 2. Download selected videos
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download

# 3. Enhanced transcription with progress monitoring
uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic

# 4. Batch process with intelligent queuing
uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt
```

### Academic Lecture Processing
```bash
# Process academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures \
  --domain academic \
  -m large \
  -f srt \
  -c 4 \
  --diarize \
  --speakers 1
```

### Podcast Production
```bash
# High-quality podcast transcription with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Podcasts \
  -m large \
  -f vtt \
  --diarize \
  --speakers 3 \
  -c 2
```

## Integration with Taskmaster

Track CLI operations using Taskmaster:

```bash
# Create task for batch processing
./scripts/tm_master.sh add "Process podcast archive with enhanced CLI"

# Track progress
./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining"

# Mark complete
./scripts/tm_master.sh done 15
```

## Troubleshooting

### Common Issues

#### Import Errors
```bash
ModuleNotFoundError: No module named 'pyannote'
```
**Solution**: Install optional dependencies for diarization
```bash
uv pip install pyannote.audio
```

#### Memory Issues
```bash
Memory error. Try using a smaller model with --model small or reduce concurrency.
```
**Solution**: Use smaller model or reduce concurrency
```bash
uv run python -m src.cli.enhanced_cli transcribe file.wav -m small -c 1
```

#### GPU Issues
```bash
CUDA out of memory
```
**Solution**: Switch to CPU processing
```bash
uv run python -m src.cli.enhanced_cli transcribe file.wav -d cpu
```

### Performance Optimization

#### For Memory-Constrained Systems
- Use `-m small` or `-m tiny` models
- Reduce concurrency with `-c 1` or `-c 2`
- Process smaller files first

#### For High-Performance Systems
- Use `-m large` models for best accuracy
- Increase concurrency with `-c 8` or higher
- Enable GPU processing with `-d cuda`

#### For Batch Processing
- Start with conservative settings
- Monitor system resources
- Adjust concurrency based on performance

## Development

### Architecture

The Enhanced CLI follows a modular, protocol-based architecture:

```python
class EnhancedCLI:
    """Main CLI with error handling and performance monitoring"""

class EnhancedTranscribeCommand:
    """Single file transcription with progress reporting"""

class EnhancedBatchCommand:
    """Batch processing with intelligent queuing"""
```

### Testing

Comprehensive test suite with 19 test cases:

```bash
# Run all enhanced CLI tests
uv run pytest tests/test_enhanced_cli.py -v

# Run specific test categories
uv run pytest tests/test_enhanced_cli.py::TestEnhancedCLI -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedTranscribeCommand -v
uv run pytest tests/test_enhanced_cli.py::TestEnhancedBatchCommand -v
```

### Code Quality

- **Lines of Code**: 483 lines
- **Test Coverage**: 100% pass rate
- **Type Hints**: Full type annotation
- **Error Handling**: Comprehensive error management
- **Documentation**: Inline documentation and examples

## Future Enhancements

### Planned Features
- **WebSocket Integration**: Real-time progress updates via WebSocket
- **Plugin System**: Extensible CLI with custom commands
- **Configuration Files**: Persistent settings and preferences
- **Advanced Metrics**: Detailed performance analytics
- **Cloud Integration**: Direct cloud storage support

### API Integration
- **REST API**: HTTP endpoints for programmatic access
- **GraphQL API**: Flexible query interface
- **Webhook Support**: Event-driven processing
- **SDK Development**: Client libraries for multiple languages

## Support

For issues and questions:

1. **Check Documentation**: Review this guide and other docs
2. **Run Tests**: Verify installation with test suite
3. **Check Logs**: Review error messages and system logs
4. **Community Support**: Use project issue tracker
5. **Performance Tuning**: Adjust settings based on system capabilities

---

*Enhanced CLI v1.0 - Comprehensive transcription interface with real-time progress reporting and performance monitoring.*