trax/PROJECT-DIRECTORY.md

220 lines
9.0 KiB
Markdown

# Project Directory Structure
This document provides an overview of the Trax Media Processing Platform directory structure and the purpose of each component.
## Root Directory
```
trax/
├── CLAUDE.md # Project context for Claude Code
├── AGENTS.md # Development rules for AI agents
├── EXECUTIVE-SUMMARY.md # High-level project overview
├── CHANGELOG.md # Version history and changes
├── PROJECT-DIRECTORY.md # This file - directory structure
├── README.md # Project introduction and quick start
├── pyproject.toml # Project configuration and dependencies
├── requirements.txt # Locked dependencies (generated)
├── scratchpad.md # Temporary notes and ideas
└── test_config.py # Configuration testing utilities
```
## Source Code (`src/`)
```
src/
├── __init__.py # Python package initialization
├── config.py # Centralized configuration system
├── main.py # Application entry point
├── cli/ # Command-line interface
│ ├── __init__.py
│ └── main.py # Click-based CLI implementation
├── services/ # Business logic services
│ ├── __init__.py
│ ├── transcription/ # Transcription services
│ │ ├── __init__.py
│ │ ├── protocols.py # Service interfaces
│ │ ├── whisper_service.py # Whisper implementation
│ │ └── enhancement.py # AI enhancement service
│ ├── caching/ # Caching layer
│ │ ├── __init__.py
│ │ ├── protocols.py # Cache interfaces
│ │ └── sqlite_cache.py # SQLite cache implementation
│ ├── batch/ # Batch processing
│ │ ├── __init__.py
│ │ ├── processor.py # Batch job processor
│ │ └── queue.py # Job queue management
│ └── export/ # Export functionality
│ ├── __init__.py
│ ├── protocols.py # Export interfaces
│ ├── json_exporter.py # JSON export
│ └── txt_exporter.py # Text export
├── models/ # Database models
│ ├── __init__.py
│ ├── base.py # Base model class
│ ├── media.py # Media file models
│ ├── transcript.py # Transcript models
│ └── batch.py # Batch job models
├── database/ # Database layer
│ ├── __init__.py
│ ├── registry.py # Database registry pattern
│ ├── connection.py # Connection management
│ └── migrations/ # Alembic migrations
├── utils/ # Utility functions
│ ├── __init__.py
│ ├── audio.py # Audio processing utilities
│ ├── validation.py # Input validation
│ └── logging.py # Logging configuration
└── agents/ # AI agent components
├── __init__.py
└── rules/ # Agent rule files
├── TRANSCRIPTION_RULES.md
├── BATCH_PROCESSING_RULES.md
├── DATABASE_RULES.md
├── CACHING_RULES.md
└── EXPORT_RULES.md
```
## Documentation (`docs/`)
```
docs/
├── architecture/ # Architecture documentation
│ ├── development-patterns.md # Historical learnings and patterns
│ ├── audio-processing.md # Audio pipeline architecture
│ └── iterative-pipeline.md # Version progression details
├── reports/ # Analysis reports
│ ├── 01-repository-inventory.md
│ ├── 02-historical-context.md
│ ├── 03-architecture-design.md
│ ├── 04-team-structure.md
│ ├── 05-technical-migration.md
│ └── 06-product-vision.md
└── team/ # Team documentation
└── job-descriptions.md # Role definitions
```
## Tests (`tests/`)
```
tests/
├── __init__.py # Test package initialization
├── conftest.py # Pytest configuration and fixtures
├── factories/ # Test data factories
│ ├── __init__.py
│ ├── media_factory.py # Media file factories
│ ├── transcript_factory.py # Transcript factories
│ └── batch_factory.py # Batch job factories
├── fixtures/ # Test fixtures and data
│ ├── audio/ # Test audio files
│ │ ├── sample_5s.wav # 5-second test file
│ │ ├── sample_30s.mp3 # 30-second test file
│ │ └── sample_2m.mp4 # 2-minute test file
│ └── transcripts/ # Expected transcript outputs
│ └── expected_outputs.json
├── unit/ # Unit tests
│ ├── test_protocols.py # Protocol interface tests
│ ├── test_models.py # Database model tests
│ └── services/ # Service unit tests
│ ├── test_batch.py # Batch service tests
│ └── test_whisper.py # Whisper service tests
└── integration/ # Integration tests
├── test_pipeline_v1.py # v1 pipeline tests
├── test_batch_processing.py # Batch processing tests
└── test_cli.py # CLI integration tests
```
## Data (`data/`)
```
data/
├── media/ # Media file storage
│ ├── downloads/ # Downloaded media files
│ └── processed/ # Processed audio files
├── exports/ # Export output files
│ ├── json/ # JSON export files
│ └── txt/ # Text export files
└── cache/ # Cache storage
├── embeddings/ # Embedding cache
├── transcripts/ # Transcript cache
└── analysis/ # Analysis cache
```
## Scripts (`scripts/`)
```
scripts/
├── setup_dev.sh # Development environment setup
├── setup_db.sh # Database initialization
├── run_tests.sh # Test execution script
└── deploy.sh # Deployment script
```
## Configuration Files
### `pyproject.toml`
- Project metadata and dependencies
- uv package manager configuration
- Development tools configuration (Black, Ruff, MyPy)
- Build system settings
### `.env` (inherited from root)
- API keys and secrets
- Database connection strings
- Service configuration
- Environment-specific settings
### `alembic.ini`
- Database migration configuration
- Alembic settings and paths
## Key File Purposes
### Core Documentation
- **CLAUDE.md**: Context for Claude Code to understand current state
- **AGENTS.md**: Development rules and workflows for AI agents
- **EXECUTIVE-SUMMARY.md**: High-level project overview and strategy
- **CHANGELOG.md**: Version history and change tracking
- **PROJECT-DIRECTORY.md**: This file - directory structure overview
### Configuration
- **src/config.py**: Centralized configuration with root .env inheritance
- **pyproject.toml**: Project dependencies and tooling configuration
- **requirements.txt**: Locked dependency versions (generated)
### Architecture
- **docs/architecture/**: Detailed architecture patterns and decisions
- **docs/reports/**: Analysis reports from YouTube Summarizer project
- **src/agents/rules/**: Agent rule files for consistency
### Testing
- **tests/fixtures/audio/**: Real audio files for testing (no mocks)
- **tests/conftest.py**: Pytest configuration and shared fixtures
- **tests/factories/**: Test data generation utilities
## Development Workflow
### File Organization Principles
1. **Separation of Concerns**: Each directory has a specific purpose
2. **Protocol-Based Design**: Interfaces defined in protocols.py files
3. **Real Files Testing**: Actual media files in test fixtures
4. **Documentation Limits**: Keep files under 600 LOC for AI comprehension
5. **Clear Naming**: Descriptive file and directory names
### Adding New Components
1. **Services**: Add to `src/services/` with protocol interface
2. **Models**: Add to `src/models/` with database registry
3. **Tests**: Add to `tests/` with real file fixtures
4. **Documentation**: Add to `docs/` with clear structure
5. **Rules**: Add to `src/agents/rules/` for consistency
### Migration Strategy
- **Database Changes**: Use Alembic migrations in `src/database/migrations/`
- **Schema Updates**: Update models and create migration
- **Data Migration**: Scripts in `scripts/` directory
- **Version Tracking**: Update CHANGELOG.md with changes
---
*Last Updated: 2024-12-19*
*Project Structure Version: 1.0*