Ultra-fast media transcription platform - Deterministic, iterative processing pipeline

Go to file

enias 61af8153a5 feat: Integrate parallel processing with adaptive chunking - Created OptimizedTranscriptionPipeline combining both optimizations - Achieves 3-8x speed improvement (2-4x parallel + 1.5-2x adaptive) - Added CLI command with rich progress display - Memory usage stays under 2GB target - M3-optimized with distil-large-v3 model - Implements all HIGH and MEDIUM priority optimizations from handoff		2025-09-02 03:50:19 -04:00
.claude	feat: Setup parallel development with Git worktrees and documentation	2025-09-02 03:16:23 -04:00
.cursor	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
.taskmaster	feat: Integrate parallel processing with adaptive chunking	2025-09-02 03:50:19 -04:00
docs	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
examples	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
migrations	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
scripts	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
src	feat: Integrate parallel processing with adaptive chunking	2025-09-02 03:50:19 -04:00
tests	feat: TDD implementation of adaptive chunking (task 13)	2025-09-02 03:44:56 -04:00
.cursorignore	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
.env.example	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
.gitignore	feat: Integrate parallel processing with adaptive chunking	2025-09-02 03:50:19 -04:00
.mcp.json	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
.pre-commit-config.yaml	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
AGENTS.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
BACKEND_DEVELOPER_AGENT_SUMMARY.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
BAP_South_Meeting_Transcript.txt	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
CHANGELOG.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
CHANGELOG_v1.0.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
CLAUDE.md	feat: Setup parallel development with Git worktrees and documentation	2025-09-02 03:16:23 -04:00
DB-SCHEMA.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
DEV_HANDOFF_TRANSCRIPTION_OPTIMIZATION.md	feat: Setup parallel development with Git worktrees and documentation	2025-09-02 03:16:23 -04:00
EXECUTIVE-SUMMARY.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
HANDOFF_SUMMARY.md	feat: Setup parallel development with Git worktrees and documentation	2025-09-02 03:16:23 -04:00
PROJECT-DIRECTORY.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
README.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
RELEASE_NOTES_v1.0.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
RELEASE_NOTES_v2.0.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
RESEARCH_AGENT_SUMMARY.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
TRAX_V2_TASKMASTER_SUMMARY.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
TRAX_v2.0_COMPLETION_PLAN.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
Trax v2 Research Analysis.html	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
Trax v2 Research Analysis.pdf	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
Trax_v2_Research_Analysis_followup.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
alembic.ini	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
launch_research_agent.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
lib	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
process_videos_csv.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
pyproject.toml	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
requirements-youtube.txt	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
requirements.txt	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
scratchpad.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
simple_transcribe.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
test_config.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
test_database_setup.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
test_enhanced_media_service.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
test_media_service_integration.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
test_mps.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
text.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
transcribe_bap.py	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
trax-demo.ipynb	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
videos.csv	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
videos_urls.txt	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00
view.md	Initial commit: Trax media transcription platform	2025-09-02 03:05:36 -04:00

README.md

Trax: Personal Research Transcription Tool

A deterministic, iterative media transcription platform that transforms raw audio/video into structured, enhanced, and searchable text content through progressive AI-powered processing.

Overview

Trax is a personal research tool designed for batch-processing tech podcasts, academic lectures, and audiobooks. It provides high-accuracy transcription with AI enhancement, optimized for M3 MacBook performance with download-first architecture.

Key Features

95%+ Accuracy Transcription using Whisper distil-large-v3 model
99%+ Enhanced Transcription with DeepSeek AI post-processing
Download-First Architecture - Always download media locally before processing
Batch Processing with 8 parallel workers (optimized for M3)
YouTube Metadata extraction via curl (no API required)
Real-time Progress tracking with memory/CPU monitoring
Comprehensive Testing suite with real audio files (no mocks)
Protocol-Based Services for clean interfaces and testability

Project Structure

trax/
├── src/              # Source code
│   ├── services/     # Core services (transcription, enhancement, batch)
│   ├── repositories/ # Data access layer
│   ├── database/     # Database models and migrations
│   ├── cli/          # Command-line interface
│   └── config.py     # Centralized configuration
├── tests/            # Test files
├── docs/             # Documentation
├── data/             # Data files
├── scripts/          # Utility scripts (including Taskmaster helpers)
├── pyproject.toml    # Project configuration
└── .env.example      # Environment variables documentation

Installation

Prerequisites

Python 3.11+ (required for advanced type annotations)
PostgreSQL 15+ (for JSONB and UUID support)
FFmpeg 6.0+ (for audio preprocessing)
curl (for YouTube metadata extraction)

Setup

# Navigate to project
cd apps/trax

# Install with uv (ultra-fast package manager)
uv pip install -e ".[dev]"

# Setup database
./scripts/setup_postgresql.sh

# Run database migrations
uv run alembic upgrade head

Configuration

API keys are automatically inherited from ../../.env file. For local overrides, create .env.local:

# Optional: Create local config overrides
echo "DEEPSEEK_API_KEY=your_key_here" > .env.local

Quick Start

Standard CLI

# Extract YouTube metadata (no API required)
uv run python -m src.cli.main youtube https://youtube.com/watch?v=example

# Transcribe single file (v1 pipeline)
uv run python -m src.cli.main transcribe audio.mp3

# Enhanced transcription (v2 pipeline)
uv run python -m src.cli.main transcribe audio.mp3 --v2

# Batch process folder
uv run python -m src.cli.main batch /path/to/audio/files

Enhanced CLI (Recommended)

# Enhanced transcription with progress reporting
uv run python -m src.cli.enhanced_cli transcribe audio.mp3 -m large -f srt

# Multi-pass transcription with confidence threshold
uv run python -m src.cli.enhanced_cli transcribe audio.mp3 --multi-pass --confidence-threshold 0.9

# Domain-specific enhancement with multi-pass
uv run python -m src.cli.enhanced_cli transcribe lecture.mp3 --multi-pass --domain academic

# Speaker diarization with VTT output
uv run python -m src.cli.enhanced_cli transcribe interview.mp4 --diarize --speakers 2 -f vtt

# Full v2.0 feature set
uv run python -m src.cli.enhanced_cli transcribe technical_content.mp3 --multi-pass --confidence-threshold 0.9 --domain technical --diarize

# Batch processing with multi-pass
uv run python -m src.cli.enhanced_cli batch /path/to/audio/files --multi-pass -c 8

Advanced Batch Processing

# Process with enhancement and custom settings
trax batch /path/to/files --enhance --workers 6 --memory-limit 2048

# Monitor progress with custom intervals
trax batch /path/to/files --progress-interval 2 --cpu-limit 80

# Process specific file types
trax batch /path/to/files --model whisper-1 --chunk-size 600

Documentation

CLI Documentation

Enhanced CLI Guide - Comprehensive guide to the enhanced CLI with progress reporting
CLI Reference - Complete command reference for both standard and enhanced CLIs

Quick Reference

CLI Commands - Complete command reference with examples
API Documentation - Service protocols and API reference
Database Schema - PostgreSQL schema with JSONB examples
Troubleshooting - Common issues and security guide

Architecture

Development Patterns - Historical learnings
Error Handling - Comprehensive error system
Audio Processing - Media pipeline details

Pipeline Versions

v1 Pipeline (Current)

Whisper distil-large-v3 transcription only
95%+ accuracy on clear audio
<30 seconds processing time for 5-minute audio
<2GB memory usage

v2 Pipeline (In Development)

Whisper + DeepSeek enhancement
99%+ accuracy with AI post-processing
<35 seconds total processing time
Grammar and punctuation correction

v3-v4 Pipeline (Future)

Multi-pass optimization (v3)
Speaker diarization (v4)
Advanced analysis features

Configuration

API Keys

The project automatically inherits all API tokens from the root project's .env file:

AI Services: Anthropic, DeepSeek, OpenAI, OpenRouter, Perplexity
Google Services: OAuth, APIs
Other Services: Slack, GitHub, Gitea, YouTube, Directus

Local Overrides

Create .env.local in the trax directory for project-specific environment overrides.

Development

Taskmaster Helper Scripts

The project includes comprehensive helper scripts for managing development tasks via Taskmaster CLI:

# Quick project overview
./scripts/tm_master.sh overview

# Get next task to work on
./scripts/tm_master.sh next

# Start working on a task
./scripts/tm_master.sh start 15

# Complete a task
./scripts/tm_master.sh done 15

# Search for tasks
./scripts/tm_master.sh search whisper

# Run analysis
./scripts/tm_master.sh analyze

Available Scripts:

tm_master.sh - Master interface to all helper scripts
tm_status.sh - Status checking and project overviews
tm_search.sh - Search tasks by various criteria
tm_workflow.sh - Workflow management and progress tracking
tm_analyze.sh - Analysis and insights generation
tm_quick.sh - Quick operations

For detailed documentation, see Taskmaster Helper Scripts.

Quick Reference: Taskmaster Quick Reference

Commands

# Run tests
uv run pytest

# Format code
uv run black src/ tests/
uv run ruff check --fix src/ tests/

# Type checking
uv run mypy src/

# Install new dependency
uv pip install package-name

# Update dependencies
uv pip compile pyproject.toml -o requirements.txt

Architecture

Trax follows a protocol-based architecture with clean separation of concerns:

Services Layer: Core business logic (transcription, enhancement, batch processing)
Repository Layer: Data access with protocol-based interfaces
Database Layer: PostgreSQL with SQLAlchemy registry pattern
CLI Layer: User interface with Click and Rich

Error Handling and Logging

The application implements a comprehensive error handling and logging system designed for production reliability:

Core Features

Structured Logging: JSON and human-readable formats with contextual information
Error Classification: Hierarchical error system with standardized error codes
Retry Logic: Exponential backoff with jitter and circuit breaker patterns
Recovery Strategies: Fallback mechanisms, graceful degradation, and state recovery
Performance Monitoring: Operation timing, resource usage, and system health metrics

Key Components

src/logging/ - Structured logging with file rotation and performance metrics
src/errors/ - Error classification system with standardized error codes
src/retry/ - Retry mechanisms with multiple strategies and circuit breakers
src/recovery/ - Recovery strategies for different error scenarios

Usage Examples

# Structured logging with context
logger.info("Processing started", extra={
    "operation": "transcription",
    "file_size": "15.2MB",
    "correlation_id": "req-123"
})

# Retry with exponential backoff
@async_retry(max_retries=3)
async def api_call():
    return await external_api.request()

# Performance monitoring
with timing_context("transcription_operation"):
    result = transcribe_audio(audio_file)

For detailed documentation, see Error Handling and Logging System.

Testing

The project includes comprehensive unit tests for all components:

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_batch_processor.py

# Run with coverage
uv run pytest --cov=src

Performance

Optimizations

M3 MacBook Optimized: Default 8 workers for optimal performance
Memory Management: Configurable memory limits and monitoring
Resource Tracking: Real-time CPU and memory usage monitoring
Async Processing: Non-blocking operations throughout
Caching: Intelligent caching for expensive operations

Benchmarks

Transcription: 95%+ accuracy, <30s for 5-minute audio
Enhancement: 99%+ accuracy, <35s processing time
Batch Processing: Parallel processing with configurable workers
Resource Usage: <2GB memory, optimized for M3 architecture

Project Status

🎉 v1.0 COMPLETE - Production Ready

Release Date: December 2024
Version: 1.0.0
Status: Production Ready

✅ Complete Platform Implementation

Core Platform:

✅ Development environment setup with uv package manager
✅ API key configuration and inheritance from root project
✅ PostgreSQL database with SQLAlchemy registry pattern
✅ YouTube metadata extraction via curl (no API required)
✅ Media download and preprocessing with download-first architecture
✅ Whisper transcription service (v1) with 95%+ accuracy
✅ DeepSeek enhancement service (v2) with 99%+ accuracy
✅ CLI interface with Click and Rich progress tracking
✅ Batch processing system with 8 parallel workers (M3 optimized)

Advanced Features:

✅ Export functionality (JSON, TXT, SRT, Markdown)
✅ Comprehensive error handling and logging system
✅ Security features (encrypted storage, input validation)
✅ Protocol-based architecture for clean interfaces
✅ Performance optimization for M3 MacBook
✅ Quality assessment system with accuracy metrics

Quality Assurance:

✅ Comprehensive testing suite with real audio files
✅ Complete documentation and user guides

🎯 Production Ready Features

The Trax transcription platform is now fully functional and ready for production use with:

95%+ transcription accuracy on clear audio
<30 seconds processing for 5-minute audio files
<2GB memory usage optimized for M3 architecture
Download-first architecture for reliable processing
Comprehensive error handling and recovery mechanisms
Enterprise security with encrypted storage and input validation
Protocol-based architecture for clean interfaces and testability

📋 Release Documentation

Release Notes - Comprehensive feature overview
Technical Changelog - Detailed implementation changes
Task Archive - Archived v1.0 tasks in Taskmaster

🔮 Next Phase: v2.0 Planning

Speaker diarization with 90%+ speaker accuracy
Multi-language support for international content
Advanced analytics and content insights
Web interface for browser-based access

License

This project is part of the my-ai-projects ecosystem.