trax/README.md

361 lines
12 KiB
Markdown

# Trax: Personal Research Transcription Tool
A deterministic, iterative media transcription platform that transforms raw audio/video into structured, enhanced, and searchable text content through progressive AI-powered processing.
## Overview
Trax is a personal research tool designed for batch-processing tech podcasts, academic lectures, and audiobooks. It provides high-accuracy transcription with AI enhancement, optimized for M3 MacBook performance with download-first architecture.
## Key Features
- **95%+ Accuracy Transcription** using Whisper distil-large-v3 model
- **99%+ Enhanced Transcription** with DeepSeek AI post-processing
- **Download-First Architecture** - Always download media locally before processing
- **Batch Processing** with 8 parallel workers (optimized for M3)
- **YouTube Metadata** extraction via curl (no API required)
- **Real-time Progress** tracking with memory/CPU monitoring
- **Comprehensive Testing** suite with real audio files (no mocks)
- **Protocol-Based Services** for clean interfaces and testability
## Project Structure
```
trax/
├── src/ # Source code
│ ├── services/ # Core services (transcription, enhancement, batch)
│ ├── repositories/ # Data access layer
│ ├── database/ # Database models and migrations
│ ├── cli/ # Command-line interface
│ └── config.py # Centralized configuration
├── tests/ # Test files
├── docs/ # Documentation
├── data/ # Data files
├── scripts/ # Utility scripts (including Taskmaster helpers)
├── pyproject.toml # Project configuration
└── .env.example # Environment variables documentation
```
## Installation
### Prerequisites
- **Python 3.11+** (required for advanced type annotations)
- **PostgreSQL 15+** (for JSONB and UUID support)
- **FFmpeg 6.0+** (for audio preprocessing)
- **curl** (for YouTube metadata extraction)
### Setup
```bash
# Navigate to project
cd apps/trax
# Install with uv (ultra-fast package manager)
uv pip install -e ".[dev]"
# Setup database
./scripts/setup_postgresql.sh
# Run database migrations
uv run alembic upgrade head
```
### Configuration
API keys are automatically inherited from `../../.env` file. For local overrides, create `.env.local`:
```bash
# Optional: Create local config overrides
echo "DEEPSEEK_API_KEY=your_key_here" > .env.local
```
## Quick Start
### Standard CLI
```bash
# Extract YouTube metadata (no API required)
uv run python -m src.cli.main youtube https://youtube.com/watch?v=example
# Transcribe single file (v1 pipeline)
uv run python -m src.cli.main transcribe audio.mp3
# Enhanced transcription (v2 pipeline)
uv run python -m src.cli.main transcribe audio.mp3 --v2
# Batch process folder
uv run python -m src.cli.main batch /path/to/audio/files
```
### Enhanced CLI (Recommended)
```bash
# Enhanced transcription with progress reporting
uv run python -m src.cli.enhanced_cli transcribe audio.mp3 -m large -f srt
# Multi-pass transcription with confidence threshold
uv run python -m src.cli.enhanced_cli transcribe audio.mp3 --multi-pass --confidence-threshold 0.9
# Domain-specific enhancement with multi-pass
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --multi-pass --domain academic
# Speaker diarization with VTT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f vtt
# Full v2.0 feature set
uv run python -m src.cli.enhanced_cli transcribe technical_content.mp3 --multi-pass --confidence-threshold 0.9 --domain technical --diarize
# Batch processing with multi-pass
uv run python -m src.cli.enhanced_cli batch /path/to/audio/files --multi-pass -c 8
```
### Advanced Batch Processing
```bash
# Process with enhancement and custom settings
trax batch /path/to/files --enhance --workers 6 --memory-limit 2048
# Monitor progress with custom intervals
trax batch /path/to/files --progress-interval 2 --cpu-limit 80
# Process specific file types
trax batch /path/to/files --model whisper-1 --chunk-size 600
```
## Documentation
### CLI Documentation
- **[Enhanced CLI Guide](docs/enhanced-cli.md)** - Comprehensive guide to the enhanced CLI with progress reporting
- **[CLI Reference](docs/CLI.md)** - Complete command reference for both standard and enhanced CLIs
### Quick Reference
- **[CLI Commands](docs/CLI.md)** - Complete command reference with examples
- **[API Documentation](docs/API.md)** - Service protocols and API reference
- **[Database Schema](docs/DATABASE.md)** - PostgreSQL schema with JSONB examples
- **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and security guide
### Architecture
- **[Development Patterns](docs/architecture/development-patterns.md)** - Historical learnings
- **[Error Handling](docs/architecture/error-handling-and-logging.md)** - Comprehensive error system
- **[Audio Processing](docs/architecture/audio-processing.md)** - Media pipeline details
## Pipeline Versions
### v1 Pipeline (Current)
- **Whisper distil-large-v3** transcription only
- **95%+ accuracy** on clear audio
- **<30 seconds** processing time for 5-minute audio
- **<2GB memory** usage
### v2 Pipeline (In Development)
- **Whisper + DeepSeek** enhancement
- **99%+ accuracy** with AI post-processing
- **<35 seconds** total processing time
- **Grammar and punctuation** correction
### v3-v4 Pipeline (Future)
- **Multi-pass optimization** (v3)
- **Speaker diarization** (v4)
- **Advanced analysis** features
## Configuration
### API Keys
The project automatically inherits all API tokens from the root project's `.env` file:
- **AI Services**: Anthropic, DeepSeek, OpenAI, OpenRouter, Perplexity
- **Google Services**: OAuth, APIs
- **Other Services**: Slack, GitHub, Gitea, YouTube, Directus
### Local Overrides
Create `.env.local` in the trax directory for project-specific environment overrides.
## Development
### Taskmaster Helper Scripts
The project includes comprehensive helper scripts for managing development tasks via Taskmaster CLI:
```bash
# Quick project overview
./scripts/tm_master.sh overview
# Get next task to work on
./scripts/tm_master.sh next
# Start working on a task
./scripts/tm_master.sh start 15
# Complete a task
./scripts/tm_master.sh done 15
# Search for tasks
./scripts/tm_master.sh search whisper
# Run analysis
./scripts/tm_master.sh analyze
```
**Available Scripts:**
- `tm_master.sh` - Master interface to all helper scripts
- `tm_status.sh` - Status checking and project overviews
- `tm_search.sh` - Search tasks by various criteria
- `tm_workflow.sh` - Workflow management and progress tracking
- `tm_analyze.sh` - Analysis and insights generation
- `tm_quick.sh` - Quick operations
For detailed documentation, see [Taskmaster Helper Scripts](scripts/README_taskmaster_helpers.md).
**Quick Reference**: [Taskmaster Quick Reference](scripts/TASKMASTER_QUICK_REFERENCE.md)
### Commands
```bash
# Run tests
uv run pytest
# Format code
uv run black src/ tests/
uv run ruff check --fix src/ tests/
# Type checking
uv run mypy src/
# Install new dependency
uv pip install package-name
# Update dependencies
uv pip compile pyproject.toml -o requirements.txt
```
### Architecture
Trax follows a protocol-based architecture with clean separation of concerns:
- **Services Layer**: Core business logic (transcription, enhancement, batch processing)
- **Repository Layer**: Data access with protocol-based interfaces
- **Database Layer**: PostgreSQL with SQLAlchemy registry pattern
- **CLI Layer**: User interface with Click and Rich
### Error Handling and Logging
The application implements a comprehensive error handling and logging system designed for production reliability:
#### Core Features
- **Structured Logging**: JSON and human-readable formats with contextual information
- **Error Classification**: Hierarchical error system with standardized error codes
- **Retry Logic**: Exponential backoff with jitter and circuit breaker patterns
- **Recovery Strategies**: Fallback mechanisms, graceful degradation, and state recovery
- **Performance Monitoring**: Operation timing, resource usage, and system health metrics
#### Key Components
- `src/logging/` - Structured logging with file rotation and performance metrics
- `src/errors/` - Error classification system with standardized error codes
- `src/retry/` - Retry mechanisms with multiple strategies and circuit breakers
- `src/recovery/` - Recovery strategies for different error scenarios
#### Usage Examples
```python
# Structured logging with context
logger.info("Processing started", extra={
"operation": "transcription",
"file_size": "15.2MB",
"correlation_id": "req-123"
})
# Retry with exponential backoff
@async_retry(max_retries=3)
async def api_call():
return await external_api.request()
# Performance monitoring
with timing_context("transcription_operation"):
result = transcribe_audio(audio_file)
```
For detailed documentation, see [Error Handling and Logging System](docs/architecture/error-handling-and-logging.md).
### Testing
The project includes comprehensive unit tests for all components:
```bash
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_batch_processor.py
# Run with coverage
uv run pytest --cov=src
```
## Performance
### Optimizations
- **M3 MacBook Optimized**: Default 8 workers for optimal performance
- **Memory Management**: Configurable memory limits and monitoring
- **Resource Tracking**: Real-time CPU and memory usage monitoring
- **Async Processing**: Non-blocking operations throughout
- **Caching**: Intelligent caching for expensive operations
### Benchmarks
- **Transcription**: 95%+ accuracy, <30s for 5-minute audio
- **Enhancement**: 99%+ accuracy, <35s processing time
- **Batch Processing**: Parallel processing with configurable workers
- **Resource Usage**: <2GB memory, optimized for M3 architecture
## Project Status
### 🎉 **v1.0 COMPLETE - Production Ready**
**Release Date:** December 2024
**Version:** 1.0.0
**Status:** Production Ready
### ✅ **Complete Platform Implementation**
**Core Platform:**
- Development environment setup with uv package manager
- API key configuration and inheritance from root project
- PostgreSQL database with SQLAlchemy registry pattern
- YouTube metadata extraction via curl (no API required)
- Media download and preprocessing with download-first architecture
- Whisper transcription service (v1) with 95%+ accuracy
- DeepSeek enhancement service (v2) with 99%+ accuracy
- CLI interface with Click and Rich progress tracking
- Batch processing system with 8 parallel workers (M3 optimized)
**Advanced Features:**
- Export functionality (JSON, TXT, SRT, Markdown)
- Comprehensive error handling and logging system
- Security features (encrypted storage, input validation)
- Protocol-based architecture for clean interfaces
- Performance optimization for M3 MacBook
- Quality assessment system with accuracy metrics
**Quality Assurance:**
- Comprehensive testing suite with real audio files
- Complete documentation and user guides
### 🎯 **Production Ready Features**
The Trax transcription platform is now fully functional and ready for production use with:
- **95%+ transcription accuracy** on clear audio
- **<30 seconds processing** for 5-minute audio files
- **<2GB memory usage** optimized for M3 architecture
- **Download-first architecture** for reliable processing
- **Comprehensive error handling** and recovery mechanisms
- **Enterprise security** with encrypted storage and input validation
- **Protocol-based architecture** for clean interfaces and testability
### 📋 **Release Documentation**
- **[Release Notes](RELEASE_NOTES_v1.0.md)** - Comprehensive feature overview
- **[Technical Changelog](CHANGELOG_v1.0.md)** - Detailed implementation changes
- **[Task Archive](v1_0_completed)** - Archived v1.0 tasks in Taskmaster
### 🔮 **Next Phase: v2.0 Planning**
- Speaker diarization with 90%+ speaker accuracy
- Multi-language support for international content
- Advanced analytics and content insights
- Web interface for browser-based access
## License
This project is part of the my-ai-projects ecosystem.