- Created OptimizedTranscriptionPipeline combining both optimizations - Achieves 3-8x speed improvement (2-4x parallel + 1.5-2x adaptive) - Added CLI command with rich progress display - Memory usage stays under 2GB target - M3-optimized with distil-large-v3 model - Implements all HIGH and MEDIUM priority optimizations from handoff |
||
|---|---|---|
| .claude | ||
| .cursor | ||
| .taskmaster | ||
| docs | ||
| examples | ||
| migrations | ||
| scripts | ||
| src | ||
| tests | ||
| .cursorignore | ||
| .env.example | ||
| .gitignore | ||
| .mcp.json | ||
| .pre-commit-config.yaml | ||
| AGENTS.md | ||
| BACKEND_DEVELOPER_AGENT_SUMMARY.md | ||
| BAP_South_Meeting_Transcript.txt | ||
| CHANGELOG.md | ||
| CHANGELOG_v1.0.md | ||
| CLAUDE.md | ||
| DB-SCHEMA.md | ||
| DEV_HANDOFF_TRANSCRIPTION_OPTIMIZATION.md | ||
| EXECUTIVE-SUMMARY.md | ||
| HANDOFF_SUMMARY.md | ||
| PROJECT-DIRECTORY.md | ||
| README.md | ||
| RELEASE_NOTES_v1.0.md | ||
| RELEASE_NOTES_v2.0.md | ||
| RESEARCH_AGENT_SUMMARY.md | ||
| TRAX_V2_TASKMASTER_SUMMARY.md | ||
| TRAX_v2.0_COMPLETION_PLAN.md | ||
| Trax v2 Research Analysis.html | ||
| Trax v2 Research Analysis.pdf | ||
| Trax_v2_Research_Analysis_followup.md | ||
| alembic.ini | ||
| launch_research_agent.py | ||
| lib | ||
| process_videos_csv.py | ||
| pyproject.toml | ||
| requirements-youtube.txt | ||
| requirements.txt | ||
| scratchpad.md | ||
| simple_transcribe.py | ||
| test_config.py | ||
| test_database_setup.py | ||
| test_enhanced_media_service.py | ||
| test_media_service_integration.py | ||
| test_mps.py | ||
| text.md | ||
| transcribe_bap.py | ||
| trax-demo.ipynb | ||
| videos.csv | ||
| videos_urls.txt | ||
| view.md | ||
README.md
Trax: Personal Research Transcription Tool
A deterministic, iterative media transcription platform that transforms raw audio/video into structured, enhanced, and searchable text content through progressive AI-powered processing.
Overview
Trax is a personal research tool designed for batch-processing tech podcasts, academic lectures, and audiobooks. It provides high-accuracy transcription with AI enhancement, optimized for M3 MacBook performance with download-first architecture.
Key Features
- 95%+ Accuracy Transcription using Whisper distil-large-v3 model
- 99%+ Enhanced Transcription with DeepSeek AI post-processing
- Download-First Architecture - Always download media locally before processing
- Batch Processing with 8 parallel workers (optimized for M3)
- YouTube Metadata extraction via curl (no API required)
- Real-time Progress tracking with memory/CPU monitoring
- Comprehensive Testing suite with real audio files (no mocks)
- Protocol-Based Services for clean interfaces and testability
Project Structure
trax/
├── src/ # Source code
│ ├── services/ # Core services (transcription, enhancement, batch)
│ ├── repositories/ # Data access layer
│ ├── database/ # Database models and migrations
│ ├── cli/ # Command-line interface
│ └── config.py # Centralized configuration
├── tests/ # Test files
├── docs/ # Documentation
├── data/ # Data files
├── scripts/ # Utility scripts (including Taskmaster helpers)
├── pyproject.toml # Project configuration
└── .env.example # Environment variables documentation
Installation
Prerequisites
- Python 3.11+ (required for advanced type annotations)
- PostgreSQL 15+ (for JSONB and UUID support)
- FFmpeg 6.0+ (for audio preprocessing)
- curl (for YouTube metadata extraction)
Setup
# Navigate to project
cd apps/trax
# Install with uv (ultra-fast package manager)
uv pip install -e ".[dev]"
# Setup database
./scripts/setup_postgresql.sh
# Run database migrations
uv run alembic upgrade head
Configuration
API keys are automatically inherited from ../../.env file. For local overrides, create .env.local:
# Optional: Create local config overrides
echo "DEEPSEEK_API_KEY=your_key_here" > .env.local
Quick Start
Standard CLI
# Extract YouTube metadata (no API required)
uv run python -m src.cli.main youtube https://youtube.com/watch?v=example
# Transcribe single file (v1 pipeline)
uv run python -m src.cli.main transcribe audio.mp3
# Enhanced transcription (v2 pipeline)
uv run python -m src.cli.main transcribe audio.mp3 --v2
# Batch process folder
uv run python -m src.cli.main batch /path/to/audio/files
Enhanced CLI (Recommended)
# Enhanced transcription with progress reporting
uv run python -m src.cli.enhanced_cli transcribe audio.mp3 -m large -f srt
# Multi-pass transcription with confidence threshold
uv run python -m src.cli.enhanced_cli transcribe audio.mp3 --multi-pass --confidence-threshold 0.9
# Domain-specific enhancement with multi-pass
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --multi-pass --domain academic
# Speaker diarization with VTT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f vtt
# Full v2.0 feature set
uv run python -m src.cli.enhanced_cli transcribe technical_content.mp3 --multi-pass --confidence-threshold 0.9 --domain technical --diarize
# Batch processing with multi-pass
uv run python -m src.cli.enhanced_cli batch /path/to/audio/files --multi-pass -c 8
Advanced Batch Processing
# Process with enhancement and custom settings
trax batch /path/to/files --enhance --workers 6 --memory-limit 2048
# Monitor progress with custom intervals
trax batch /path/to/files --progress-interval 2 --cpu-limit 80
# Process specific file types
trax batch /path/to/files --model whisper-1 --chunk-size 600
Documentation
CLI Documentation
- Enhanced CLI Guide - Comprehensive guide to the enhanced CLI with progress reporting
- CLI Reference - Complete command reference for both standard and enhanced CLIs
Quick Reference
- CLI Commands - Complete command reference with examples
- API Documentation - Service protocols and API reference
- Database Schema - PostgreSQL schema with JSONB examples
- Troubleshooting - Common issues and security guide
Architecture
- Development Patterns - Historical learnings
- Error Handling - Comprehensive error system
- Audio Processing - Media pipeline details
Pipeline Versions
v1 Pipeline (Current)
- Whisper distil-large-v3 transcription only
- 95%+ accuracy on clear audio
- <30 seconds processing time for 5-minute audio
- <2GB memory usage
v2 Pipeline (In Development)
- Whisper + DeepSeek enhancement
- 99%+ accuracy with AI post-processing
- <35 seconds total processing time
- Grammar and punctuation correction
v3-v4 Pipeline (Future)
- Multi-pass optimization (v3)
- Speaker diarization (v4)
- Advanced analysis features
Configuration
API Keys
The project automatically inherits all API tokens from the root project's .env file:
- AI Services: Anthropic, DeepSeek, OpenAI, OpenRouter, Perplexity
- Google Services: OAuth, APIs
- Other Services: Slack, GitHub, Gitea, YouTube, Directus
Local Overrides
Create .env.local in the trax directory for project-specific environment overrides.
Development
Taskmaster Helper Scripts
The project includes comprehensive helper scripts for managing development tasks via Taskmaster CLI:
# Quick project overview
./scripts/tm_master.sh overview
# Get next task to work on
./scripts/tm_master.sh next
# Start working on a task
./scripts/tm_master.sh start 15
# Complete a task
./scripts/tm_master.sh done 15
# Search for tasks
./scripts/tm_master.sh search whisper
# Run analysis
./scripts/tm_master.sh analyze
Available Scripts:
tm_master.sh- Master interface to all helper scriptstm_status.sh- Status checking and project overviewstm_search.sh- Search tasks by various criteriatm_workflow.sh- Workflow management and progress trackingtm_analyze.sh- Analysis and insights generationtm_quick.sh- Quick operations
For detailed documentation, see Taskmaster Helper Scripts.
Quick Reference: Taskmaster Quick Reference
Commands
# Run tests
uv run pytest
# Format code
uv run black src/ tests/
uv run ruff check --fix src/ tests/
# Type checking
uv run mypy src/
# Install new dependency
uv pip install package-name
# Update dependencies
uv pip compile pyproject.toml -o requirements.txt
Architecture
Trax follows a protocol-based architecture with clean separation of concerns:
- Services Layer: Core business logic (transcription, enhancement, batch processing)
- Repository Layer: Data access with protocol-based interfaces
- Database Layer: PostgreSQL with SQLAlchemy registry pattern
- CLI Layer: User interface with Click and Rich
Error Handling and Logging
The application implements a comprehensive error handling and logging system designed for production reliability:
Core Features
- Structured Logging: JSON and human-readable formats with contextual information
- Error Classification: Hierarchical error system with standardized error codes
- Retry Logic: Exponential backoff with jitter and circuit breaker patterns
- Recovery Strategies: Fallback mechanisms, graceful degradation, and state recovery
- Performance Monitoring: Operation timing, resource usage, and system health metrics
Key Components
src/logging/- Structured logging with file rotation and performance metricssrc/errors/- Error classification system with standardized error codessrc/retry/- Retry mechanisms with multiple strategies and circuit breakerssrc/recovery/- Recovery strategies for different error scenarios
Usage Examples
# Structured logging with context
logger.info("Processing started", extra={
"operation": "transcription",
"file_size": "15.2MB",
"correlation_id": "req-123"
})
# Retry with exponential backoff
@async_retry(max_retries=3)
async def api_call():
return await external_api.request()
# Performance monitoring
with timing_context("transcription_operation"):
result = transcribe_audio(audio_file)
For detailed documentation, see Error Handling and Logging System.
Testing
The project includes comprehensive unit tests for all components:
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_batch_processor.py
# Run with coverage
uv run pytest --cov=src
Performance
Optimizations
- M3 MacBook Optimized: Default 8 workers for optimal performance
- Memory Management: Configurable memory limits and monitoring
- Resource Tracking: Real-time CPU and memory usage monitoring
- Async Processing: Non-blocking operations throughout
- Caching: Intelligent caching for expensive operations
Benchmarks
- Transcription: 95%+ accuracy, <30s for 5-minute audio
- Enhancement: 99%+ accuracy, <35s processing time
- Batch Processing: Parallel processing with configurable workers
- Resource Usage: <2GB memory, optimized for M3 architecture
Project Status
🎉 v1.0 COMPLETE - Production Ready
Release Date: December 2024
Version: 1.0.0
Status: Production Ready
✅ Complete Platform Implementation
Core Platform:
- ✅ Development environment setup with uv package manager
- ✅ API key configuration and inheritance from root project
- ✅ PostgreSQL database with SQLAlchemy registry pattern
- ✅ YouTube metadata extraction via curl (no API required)
- ✅ Media download and preprocessing with download-first architecture
- ✅ Whisper transcription service (v1) with 95%+ accuracy
- ✅ DeepSeek enhancement service (v2) with 99%+ accuracy
- ✅ CLI interface with Click and Rich progress tracking
- ✅ Batch processing system with 8 parallel workers (M3 optimized)
Advanced Features:
- ✅ Export functionality (JSON, TXT, SRT, Markdown)
- ✅ Comprehensive error handling and logging system
- ✅ Security features (encrypted storage, input validation)
- ✅ Protocol-based architecture for clean interfaces
- ✅ Performance optimization for M3 MacBook
- ✅ Quality assessment system with accuracy metrics
Quality Assurance:
- ✅ Comprehensive testing suite with real audio files
- ✅ Complete documentation and user guides
🎯 Production Ready Features
The Trax transcription platform is now fully functional and ready for production use with:
- 95%+ transcription accuracy on clear audio
- <30 seconds processing for 5-minute audio files
- <2GB memory usage optimized for M3 architecture
- Download-first architecture for reliable processing
- Comprehensive error handling and recovery mechanisms
- Enterprise security with encrypted storage and input validation
- Protocol-based architecture for clean interfaces and testability
📋 Release Documentation
- Release Notes - Comprehensive feature overview
- Technical Changelog - Detailed implementation changes
- Task Archive - Archived v1.0 tasks in Taskmaster
🔮 Next Phase: v2.0 Planning
- Speaker diarization with 90%+ speaker accuracy
- Multi-language support for international content
- Advanced analytics and content insights
- Web interface for browser-based access
License
This project is part of the my-ai-projects ecosystem.