# Trax: Personal Research Transcription Tool A deterministic, iterative media transcription platform that transforms raw audio/video into structured, enhanced, and searchable text content through progressive AI-powered processing. ## Overview Trax is a personal research tool designed for batch-processing tech podcasts, academic lectures, and audiobooks. It provides high-accuracy transcription with AI enhancement, optimized for M3 MacBook performance with download-first architecture. ## Key Features - **95%+ Accuracy Transcription** using Whisper distil-large-v3 model - **99%+ Enhanced Transcription** with DeepSeek AI post-processing - **Download-First Architecture** - Always download media locally before processing - **Batch Processing** with 8 parallel workers (optimized for M3) - **YouTube Metadata** extraction via curl (no API required) - **Real-time Progress** tracking with memory/CPU monitoring - **Comprehensive Testing** suite with real audio files (no mocks) - **Protocol-Based Services** for clean interfaces and testability ## Project Structure ``` trax/ ├── src/ # Source code │ ├── services/ # Core services (transcription, enhancement, batch) │ ├── repositories/ # Data access layer │ ├── database/ # Database models and migrations │ ├── cli/ # Command-line interface │ └── config.py # Centralized configuration ├── tests/ # Test files ├── docs/ # Documentation ├── data/ # Data files ├── scripts/ # Utility scripts (including Taskmaster helpers) ├── pyproject.toml # Project configuration └── .env.example # Environment variables documentation ``` ## Installation ### Prerequisites - **Python 3.11+** (required for advanced type annotations) - **PostgreSQL 15+** (for JSONB and UUID support) - **FFmpeg 6.0+** (for audio preprocessing) - **curl** (for YouTube metadata extraction) ### Setup ```bash # Navigate to project cd apps/trax # Install with uv (ultra-fast package manager) uv pip install -e ".[dev]" # Setup database ./scripts/setup_postgresql.sh # Run database migrations uv run alembic upgrade head ``` ### Configuration API keys are automatically inherited from `../../.env` file. For local overrides, create `.env.local`: ```bash # Optional: Create local config overrides echo "DEEPSEEK_API_KEY=your_key_here" > .env.local ``` ## Quick Start ### Standard CLI ```bash # Extract YouTube metadata (no API required) uv run python -m src.cli.main youtube https://youtube.com/watch?v=example # Transcribe single file (v1 pipeline) uv run python -m src.cli.main transcribe audio.mp3 # Enhanced transcription (v2 pipeline) uv run python -m src.cli.main transcribe audio.mp3 --v2 # Batch process folder uv run python -m src.cli.main batch /path/to/audio/files ``` ### Enhanced CLI (Recommended) ```bash # Enhanced transcription with progress reporting uv run python -m src.cli.enhanced_cli transcribe audio.mp3 -m large -f srt # Multi-pass transcription with confidence threshold uv run python -m src.cli.enhanced_cli transcribe audio.mp3 --multi-pass --confidence-threshold 0.9 # Domain-specific enhancement with multi-pass uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --multi-pass --domain academic # Speaker diarization with VTT output uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f vtt # Full v2.0 feature set uv run python -m src.cli.enhanced_cli transcribe technical_content.mp3 --multi-pass --confidence-threshold 0.9 --domain technical --diarize # Batch processing with multi-pass uv run python -m src.cli.enhanced_cli batch /path/to/audio/files --multi-pass -c 8 ``` ### Advanced Batch Processing ```bash # Process with enhancement and custom settings trax batch /path/to/files --enhance --workers 6 --memory-limit 2048 # Monitor progress with custom intervals trax batch /path/to/files --progress-interval 2 --cpu-limit 80 # Process specific file types trax batch /path/to/files --model whisper-1 --chunk-size 600 ``` ## Documentation ### CLI Documentation - **[Enhanced CLI Guide](docs/enhanced-cli.md)** - Comprehensive guide to the enhanced CLI with progress reporting - **[CLI Reference](docs/CLI.md)** - Complete command reference for both standard and enhanced CLIs ### Quick Reference - **[CLI Commands](docs/CLI.md)** - Complete command reference with examples - **[API Documentation](docs/API.md)** - Service protocols and API reference - **[Database Schema](docs/DATABASE.md)** - PostgreSQL schema with JSONB examples - **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and security guide ### Architecture - **[Development Patterns](docs/architecture/development-patterns.md)** - Historical learnings - **[Error Handling](docs/architecture/error-handling-and-logging.md)** - Comprehensive error system - **[Audio Processing](docs/architecture/audio-processing.md)** - Media pipeline details ## Pipeline Versions ### v1 Pipeline (Current) - **Whisper distil-large-v3** transcription only - **95%+ accuracy** on clear audio - **<30 seconds** processing time for 5-minute audio - **<2GB memory** usage ### v2 Pipeline (In Development) - **Whisper + DeepSeek** enhancement - **99%+ accuracy** with AI post-processing - **<35 seconds** total processing time - **Grammar and punctuation** correction ### v3-v4 Pipeline (Future) - **Multi-pass optimization** (v3) - **Speaker diarization** (v4) - **Advanced analysis** features ## Configuration ### API Keys The project automatically inherits all API tokens from the root project's `.env` file: - **AI Services**: Anthropic, DeepSeek, OpenAI, OpenRouter, Perplexity - **Google Services**: OAuth, APIs - **Other Services**: Slack, GitHub, Gitea, YouTube, Directus ### Local Overrides Create `.env.local` in the trax directory for project-specific environment overrides. ## Development ### Taskmaster Helper Scripts The project includes comprehensive helper scripts for managing development tasks via Taskmaster CLI: ```bash # Quick project overview ./scripts/tm_master.sh overview # Get next task to work on ./scripts/tm_master.sh next # Start working on a task ./scripts/tm_master.sh start 15 # Complete a task ./scripts/tm_master.sh done 15 # Search for tasks ./scripts/tm_master.sh search whisper # Run analysis ./scripts/tm_master.sh analyze ``` **Available Scripts:** - `tm_master.sh` - Master interface to all helper scripts - `tm_status.sh` - Status checking and project overviews - `tm_search.sh` - Search tasks by various criteria - `tm_workflow.sh` - Workflow management and progress tracking - `tm_analyze.sh` - Analysis and insights generation - `tm_quick.sh` - Quick operations For detailed documentation, see [Taskmaster Helper Scripts](scripts/README_taskmaster_helpers.md). **Quick Reference**: [Taskmaster Quick Reference](scripts/TASKMASTER_QUICK_REFERENCE.md) ### Commands ```bash # Run tests uv run pytest # Format code uv run black src/ tests/ uv run ruff check --fix src/ tests/ # Type checking uv run mypy src/ # Install new dependency uv pip install package-name # Update dependencies uv pip compile pyproject.toml -o requirements.txt ``` ### Architecture Trax follows a protocol-based architecture with clean separation of concerns: - **Services Layer**: Core business logic (transcription, enhancement, batch processing) - **Repository Layer**: Data access with protocol-based interfaces - **Database Layer**: PostgreSQL with SQLAlchemy registry pattern - **CLI Layer**: User interface with Click and Rich ### Error Handling and Logging The application implements a comprehensive error handling and logging system designed for production reliability: #### Core Features - **Structured Logging**: JSON and human-readable formats with contextual information - **Error Classification**: Hierarchical error system with standardized error codes - **Retry Logic**: Exponential backoff with jitter and circuit breaker patterns - **Recovery Strategies**: Fallback mechanisms, graceful degradation, and state recovery - **Performance Monitoring**: Operation timing, resource usage, and system health metrics #### Key Components - `src/logging/` - Structured logging with file rotation and performance metrics - `src/errors/` - Error classification system with standardized error codes - `src/retry/` - Retry mechanisms with multiple strategies and circuit breakers - `src/recovery/` - Recovery strategies for different error scenarios #### Usage Examples ```python # Structured logging with context logger.info("Processing started", extra={ "operation": "transcription", "file_size": "15.2MB", "correlation_id": "req-123" }) # Retry with exponential backoff @async_retry(max_retries=3) async def api_call(): return await external_api.request() # Performance monitoring with timing_context("transcription_operation"): result = transcribe_audio(audio_file) ``` For detailed documentation, see [Error Handling and Logging System](docs/architecture/error-handling-and-logging.md). ### Testing The project includes comprehensive unit tests for all components: ```bash # Run all tests uv run pytest # Run specific test file uv run pytest tests/test_batch_processor.py # Run with coverage uv run pytest --cov=src ``` ## Performance ### Optimizations - **M3 MacBook Optimized**: Default 8 workers for optimal performance - **Memory Management**: Configurable memory limits and monitoring - **Resource Tracking**: Real-time CPU and memory usage monitoring - **Async Processing**: Non-blocking operations throughout - **Caching**: Intelligent caching for expensive operations ### Benchmarks - **Transcription**: 95%+ accuracy, <30s for 5-minute audio - **Enhancement**: 99%+ accuracy, <35s processing time - **Batch Processing**: Parallel processing with configurable workers - **Resource Usage**: <2GB memory, optimized for M3 architecture ## Project Status ### 🎉 **v1.0 COMPLETE - Production Ready** **Release Date:** December 2024 **Version:** 1.0.0 **Status:** Production Ready ### ✅ **Complete Platform Implementation** **Core Platform:** - ✅ Development environment setup with uv package manager - ✅ API key configuration and inheritance from root project - ✅ PostgreSQL database with SQLAlchemy registry pattern - ✅ YouTube metadata extraction via curl (no API required) - ✅ Media download and preprocessing with download-first architecture - ✅ Whisper transcription service (v1) with 95%+ accuracy - ✅ DeepSeek enhancement service (v2) with 99%+ accuracy - ✅ CLI interface with Click and Rich progress tracking - ✅ Batch processing system with 8 parallel workers (M3 optimized) **Advanced Features:** - ✅ Export functionality (JSON, TXT, SRT, Markdown) - ✅ Comprehensive error handling and logging system - ✅ Security features (encrypted storage, input validation) - ✅ Protocol-based architecture for clean interfaces - ✅ Performance optimization for M3 MacBook - ✅ Quality assessment system with accuracy metrics **Quality Assurance:** - ✅ Comprehensive testing suite with real audio files - ✅ Complete documentation and user guides ### 🎯 **Production Ready Features** The Trax transcription platform is now fully functional and ready for production use with: - **95%+ transcription accuracy** on clear audio - **<30 seconds processing** for 5-minute audio files - **<2GB memory usage** optimized for M3 architecture - **Download-first architecture** for reliable processing - **Comprehensive error handling** and recovery mechanisms - **Enterprise security** with encrypted storage and input validation - **Protocol-based architecture** for clean interfaces and testability ### 📋 **Release Documentation** - **[Release Notes](RELEASE_NOTES_v1.0.md)** - Comprehensive feature overview - **[Technical Changelog](CHANGELOG_v1.0.md)** - Detailed implementation changes - **[Task Archive](v1_0_completed)** - Archived v1.0 tasks in Taskmaster ### 🔮 **Next Phase: v2.0 Planning** - Speaker diarization with 90%+ speaker accuracy - Multi-language support for international content - Advanced analytics and content insights - Web interface for browser-based access ## License This project is part of the my-ai-projects ecosystem.