trax/docs/CLI.md

16 KiB

CLI Command Reference

Complete reference for all Trax CLI commands with examples and options.

Command Structure

Trax provides two CLI interfaces:

Standard CLI

uv run python -m src.cli.main <command> [options] [arguments]
uv run python -m src.cli.enhanced_cli <command> [options] [arguments]

The enhanced CLI provides:

  • Real-time progress reporting with Rich progress bars
  • Performance monitoring (CPU, memory, temperature)
  • Intelligent batch processing with concurrent execution
  • Enhanced error handling with user-friendly guidance
  • Multiple export formats (JSON, TXT, SRT, VTT)
  • Advanced features (speaker diarization, domain adaptation)

Enhanced CLI Commands

Enhanced CLI Overview

The enhanced CLI (src.cli.enhanced_cli) provides a modern, feature-rich interface with real-time progress reporting and advanced capabilities.

Key Features:

  • Rich Progress Bars: Real-time transcription progress with time estimates
  • Performance Monitoring: Live CPU, memory, and temperature tracking
  • Intelligent Queuing: Batch processing with size-based prioritization
  • Advanced Export: Multiple formats including SRT and VTT subtitles
  • Error Guidance: Helpful suggestions for common issues
  • Optional Features: Speaker diarization and domain adaptation

transcribe <input>

Enhanced single file transcription with progress reporting.

Usage:

uv run python -m src.cli.enhanced_cli transcribe input.wav

Options:

  • -o, --output PATH - Output directory (default: current directory)
  • -f, --format [json|txt|srt|vtt] - Output format (default: json)
  • -m, --model [tiny|base|small|medium|large] - Model size (default: base)
  • -d, --device [cpu|cuda] - Processing device (default: cpu)
  • --domain [general|technical|medical|academic] - Domain adaptation
  • --diarize - Enable speaker diarization
  • --speakers INTEGER - Number of speakers (for diarization)

Examples:

# Basic transcription with progress bar
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3

# Enhanced transcription with domain adaptation
uv run python -m src.cli.enhanced_cli transcribe medical_audio.wav --domain medical

# Speaker diarization with SRT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f srt

# High-quality transcription with large model
uv run python -m src.cli.enhanced_cli transcribe podcast.mp3 -m large -f vtt

batch <input>

Enhanced batch processing with intelligent queuing and concurrent execution.

Usage:

uv run python -m src.cli.enhanced_cli batch /path/to/audio/files

Options:

  • -o, --output PATH - Output directory (default: current directory)
  • -c, --concurrency INTEGER - Number of concurrent processes (default: 4)
  • -f, --format [json|txt|srt|vtt] - Output format (default: json)
  • -m, --model [tiny|base|small|medium|large] - Model size (default: base)
  • -d, --device [cpu|cuda] - Processing device (default: cpu)
  • --domain [general|technical|medical|academic] - Domain adaptation
  • --diarize - Enable speaker diarization
  • --speakers INTEGER - Number of speakers (for diarization)

Examples:

# Batch process with 8 concurrent workers
uv run python -m src.cli.enhanced_cli batch ~/Podcasts -c 8

# Process with domain adaptation and speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Lectures --domain academic --diarize

# Conservative processing for memory-constrained systems
uv run python -m src.cli.enhanced_cli batch ~/Audio -c 2 -m small

# High-quality batch processing
uv run python -m src.cli.enhanced_cli batch ~/Interviews -m large -f srt --diarize --speakers 3

Intelligent Queuing: The enhanced batch processor automatically:

  • Sorts files by size (smaller files first for faster feedback)
  • Monitors system resources in real-time
  • Provides detailed progress for each file
  • Handles errors gracefully without stopping the batch

Enhanced Progress Tracking Features

Multi-Pass Pipeline Progress Visualization

When using the --multi-pass option, the CLI provides detailed progress tracking for each stage of the multi-pass transcription pipeline:

Stage 1: Fast Transcription Pass

  • Real-time progress with confidence scoring
  • Segment generation and quality assessment
  • Low-confidence segment identification

Stage 2: Refinement Pass

  • Progress tracking for low-confidence segments
  • Audio slicing and re-transcription
  • Quality improvement monitoring

Stage 3: Enhancement Pass

  • Domain-specific enhancement progress
  • Content optimization tracking
  • Final quality validation

Stage 4: Speaker Diarization (if enabled)

  • Parallel speaker identification
  • Speaker count and segmentation progress
  • Integration with transcription results

System Resource Monitoring

The enhanced CLI includes real-time system resource monitoring:

CPU Usage Monitoring

  • Current and peak CPU utilization
  • Performance warnings at 80%+ and 95%+ thresholds
  • Processing optimization recommendations

Memory Usage Tracking

  • Real-time memory consumption
  • Peak memory usage during processing
  • Memory optimization suggestions

Disk and Network I/O

  • Storage usage monitoring
  • Network activity tracking
  • Performance bottleneck identification

Temperature Monitoring

  • CPU temperature tracking (when available)
  • Thermal throttling warnings
  • Performance impact assessment

Error Recovery and Export Progress

Error Recovery Tracking

  • Automatic error detection and classification
  • Recovery attempt progress monitoring
  • Success/failure rate reporting
  • User guidance for common issues

Multi-Format Export Progress

  • Concurrent export to multiple formats
  • Individual format progress tracking
  • Export success rate monitoring
  • Output file path reporting

Progress Display Features

Rich Visual Interface

  • Beautiful progress bars with Rich library
  • Real-time stage and sub-stage updates
  • Time remaining estimates
  • Spinner animations for active operations

Status Indicators

  • 🟢 Healthy resource usage
  • 🟡 Moderate resource usage (warning)
  • 🔴 High resource usage (critical)
  • Completed operations
  • ⚠️ Warnings and issues
  • Errors and failures

Progress Callbacks

  • Stage transition notifications
  • Quality metric updates
  • Performance benchmark reporting
  • User guidance and tips

Standard CLI Commands

youtube <url>

Extract metadata from YouTube URLs without requiring API access.

Usage:

uv run python -m src.cli.main youtube https://youtube.com/watch?v=VIDEO_ID

Options:

  • --download - Download media after metadata extraction
  • --queue - Add to batch queue for processing
  • --json - Output as JSON (default)
  • --txt - Output as plain text

Examples:

# Extract metadata only
uv run python -m src.cli.main youtube https://youtube.com/watch?v=dQw4w9WgXcQ

# Extract and download immediately ✅ WORKING
uv run python -m src.cli.main youtube https://youtube.com/watch?v=dQw4w9WgXcQ --download

# Plain text output
uv run python -m src.cli.main youtube https://youtube.com/watch?v=dQw4w9WgXcQ --txt

Download Pipeline Status: FULLY FUNCTIONAL

  • Media download with progress tracking
  • Automatic file format detection
  • Downloaded files saved to data/media/downloads/
  • File hash generation for integrity verification

Supported URL Formats:

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/watch?v=VIDEO_ID&t=123s

batch-urls <file>

Process multiple YouTube URLs from a text file.

Usage:

uv run python -m src.cli.main batch-urls urls.txt

File Format:

https://youtube.com/watch?v=video1
https://youtube.com/watch?v=video2
https://youtu.be/video3

Options:

  • --download - Download all media after metadata extraction
  • --queue - Add all to batch processing queue
  • --workers <n> - Number of parallel workers (default: 4)

Examples:

# Process URLs file
uv run python -m src.cli.main batch-urls my_videos.txt

# Process and download with parallel processing ✅ WORKING
uv run python -m src.cli.main batch-urls my_videos.txt --download

# Download with text output format
uv run python -m src.cli.main batch-urls my_videos.txt --download --txt

Batch Download Status: FULLY FUNCTIONAL

  • Parallel processing of multiple URLs
  • Progress tracking for each download
  • Comprehensive success/failure reporting
  • Automatic error handling and retry logic

transcribe <file>

Transcribe a single audio or video file.

Usage:

uv run python -m src.cli.main transcribe path/to/audio.mp3

Options:

  • --v1 - Use v1 pipeline (Whisper only, default)
  • --v2 - Use v2 pipeline (Whisper + DeepSeek enhancement)
  • --json - Output as JSON (default)
  • --txt - Output as plain text
  • --min-accuracy <percent> - Minimum accuracy threshold (default: 80%)

Supported Formats:

  • Audio: MP3, WAV, M4A, FLAC, OGG
  • Video: MP4, AVI, MOV, MKV, WEBM

Examples:

# Basic transcription (v1 pipeline)
uv run python -m src.cli.main transcribe lecture.mp3

# Enhanced transcription (v2 pipeline)
uv run python -m src.cli.main transcribe podcast.mp4 --v2

# Plain text output with accuracy threshold
uv run python -m src.cli.main transcribe audio.wav --txt --min-accuracy 90

batch <folder>

Batch process multiple audio/video files in a directory.

Usage:

uv run python -m src.cli.main batch /path/to/audio/files

Options:

  • --v1 - Use v1 pipeline (default)
  • --v2 - Use v2 pipeline with enhancement
  • --workers <n> - Number of parallel workers (default: 8)
  • --min-accuracy <percent> - Minimum accuracy threshold (default: 80%)
  • --recursive - Process subdirectories recursively
  • --pattern <glob> - File pattern to match (e.g., "*.mp3")

Examples:

# Process all audio files with 8 workers
uv run python -m src.cli.main batch /Users/me/podcasts

# Enhanced processing with custom settings
uv run python -m src.cli.main batch /Users/me/lectures --v2 --workers 4 --min-accuracy 95

# Process only MP3 files recursively
uv run python -m src.cli.main batch /Users/me/audio --recursive --pattern "*.mp3"

Enhanced CLI Features

Real-Time Performance Monitoring

The enhanced CLI provides live system monitoring during processing:

# Performance stats are displayed automatically
CPU: 45.2% | Memory: 2.1GB/8GB (26%) | Temp: 65°C

Monitored Metrics:

  • CPU Usage: Real-time CPU utilization percentage
  • Memory Usage: Current and total memory with percentage
  • Temperature: CPU temperature monitoring (when available)
  • Processing Speed: Time estimates and completion percentages

Enhanced Error Handling

The enhanced CLI provides intelligent error guidance:

# Memory error with helpful suggestions
❌ Memory error. Try using a smaller model with --model small or reduce concurrency.

# File not found with guidance
❌ File not found: lecture.mp3
💡 Check that the input file path is correct and the file exists.

# GPU error with alternatives
❌ CUDA out of memory
💡 GPU-related error. Try using --device cpu instead.

Error Categories:

  • File Errors: Path validation and existence checks
  • Memory Errors: Model size and concurrency suggestions
  • GPU Errors: Device fallback recommendations
  • Permission Errors: File access guidance
  • Generic Errors: General troubleshooting tips

Performance Guidelines

Enhanced CLI Optimization

  • Default Concurrency: 4 (balanced for most systems)
  • Memory Usage: <2GB per pipeline
  • Processing Speed: <30s for 5-minute audio (v1)
  • Real-time Factor: <0.1 (much faster than real-time)
  • Progress Updates: Every 2-5 seconds

M3 MacBook Optimization

  • Default Workers: 8 (optimal for M3 chip)
  • Memory Usage: <2GB per pipeline
  • Processing Speed: <30s for 5-minute audio (v1)
  • Real-time Factor: <0.1 (much faster than real-time)

Worker Configuration

# Conservative (low memory)
--workers 4

# Balanced (default)
--workers 8  

# Aggressive (high-end M3)
--workers 12

Output Formats

Enhanced CLI Formats

The enhanced CLI supports multiple output formats:

JSON Output (Default)

{
  "text_content": "Never gonna give you up...",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Never gonna give you up"
    }
  ],
  "confidence": 0.95,
  "processing_time": 5.2
}

Text Output

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
...

SRT Subtitles

1
00:00:00,000 --> 00:00:02,500
Never gonna give you up

2
00:00:02,500 --> 00:00:05,000
Never gonna let you down

VTT Subtitles

WEBVTT

00:00:00.000 --> 00:00:02.500
Never gonna give you up

00:00:02.500 --> 00:00:05.000
Never gonna let you down

Standard CLI Formats

JSON Output (Default)

{
  "youtube_id": "dQw4w9WgXcQ",
  "title": "Rick Astley - Never Gonna Give You Up",
  "channel": "Rick Astley",
  "duration_seconds": 212,
  "transcript": {
    "text": "Never gonna give you up...",
    "segments": [...],
    "confidence": 0.95
  }
}

Text Output

Title: Rick Astley - Never Gonna Give You Up
Channel: Rick Astley
Duration: 3:32

Transcript:
Never gonna give you up
Never gonna let you down
...

Common Workflows

Enhanced CLI Workflows

Research Workflow (Enhanced)

# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt

# 2. Download selected videos
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download

# 3. Enhanced transcription with progress monitoring
uv run python -m src.cli.enhanced_cli transcribe downloaded_video.mp4 -m large --domain academic

# 4. Batch process with intelligent queuing
uv run python -m src.cli.enhanced_cli batch ~/Downloads/research_audio -c 6 -f srt

Academic Lecture Processing

# Process academic lectures with domain adaptation
uv run python -m src.cli.enhanced_cli batch ~/Lectures \
  --domain academic \
  -m large \
  -f srt \
  -c 4 \
  --diarize \
  --speakers 1

Podcast Production

# High-quality podcast transcription with speaker diarization
uv run python -m src.cli.enhanced_cli batch ~/Podcasts \
  -m large \
  -f vtt \
  --diarize \
  --speakers 3 \
  -c 2

Standard CLI Workflows

Research Workflow FUNCTIONAL

# 1. Extract metadata from YouTube playlist
uv run python -m src.cli.main batch-urls research_videos.txt

# 2. Download selected videos ✅ WORKING
uv run python -m src.cli.main youtube https://youtube.com/watch?v=interesting --download

# 3. Transcribe downloaded media
uv run python -m src.cli.main transcribe data/media/downloads/video.m4a --v2

# 4. Batch process entire folder
uv run python -m src.cli.main batch data/media/downloads --v2

Complete Pipeline Status:

  • YouTube metadata extraction - Working
  • Media download - Working with progress tracking
  • 🚧 Transcription - Ready for implementation
  • 🚧 Batch processing - Ready for implementation

Podcast Processing

# Process entire podcast folder with high accuracy
uv run python -m src.cli.main batch ~/Podcasts --v2 --min-accuracy 95 --workers 6

Academic Lectures

# Conservative processing for complex academic content
uv run python -m src.cli.main batch ~/Lectures --v2 --workers 4 --min-accuracy 99

Error Handling

Commands automatically handle common errors:

  • Network timeouts - Automatic retry with exponential backoff
  • File format issues - Automatic conversion to supported formats
  • Memory limits - Automatic chunking for large files
  • API rate limits - Automatic throttling and retry

For troubleshooting specific errors, see TROUBLESHOOTING.md.

Integration with Taskmaster

All CLI operations can be tracked using Taskmaster:

# Create task for batch processing
./scripts/tm_master.sh add "Process podcast archive with v2 pipeline"

# Track progress
./scripts/tm_workflow.sh update 15 "Processed 50 files, 10 remaining"

# Mark complete
./scripts/tm_master.sh done 15

See Taskmaster Helper Scripts for complete integration guide.