trax/BACKEND_DEVELOPER_AGENT_SUM...

286 lines
7.6 KiB
Markdown

# Backend Developer Agent - Capabilities & Tools
## 🎯 Agent Overview
The **Backend Python Developer Agent** is a comprehensive representation of the first backend developer hire for the Trax media processing platform. This agent has access to specific tools and capabilities needed to build the protocol-based transcription pipeline from v1 to v4.
### Agent Profile
- **Name**: Backend Python Developer
- **Role**: Senior Backend Developer
- **Experience Level**: Senior
- **Salary Range**: $150,000 - $200,000
- **Current Focus**: Phase 1: Foundation (Weeks 1-2)
## 🛠️ Available Tools by Category
### 1. Core Development Tools
**Tools**: 3 | **Skills**: 8
#### Python 3.11+ Development
- **Async Programming**: Write async/await code for concurrent operations
- **Protocol Design**: Create protocol-based service interfaces
- **Type Hints**: Use comprehensive type hints throughout
#### uv Package Manager
- **Install Dependencies**: Install project dependencies
- **Compile Requirements**: Generate requirements.txt from pyproject.toml
- **Run Commands**: Execute Python commands with uv
#### Click CLI Framework
- **Create transcription commands**
- **Build batch processing interface**
- **Implement export functionality**
### 2. Database Tools
**Tools**: 2 | **Skills**: 4
#### PostgreSQL + SQLAlchemy
- **Model Definition**: Define SQLAlchemy models with JSONB
- **Database Migrations**: Create and apply Alembic migrations
- **JSONB Operations**: Perform JSONB queries and operations
#### Database Registry Pattern
- **Implement centralized model registry**
- **Handle multiple database connections**
- **Manage model relationships**
### 3. ML Integration Tools
**Tools**: 3 | **Skills**: 6
#### Whisper Integration
- **Model Loading**: Load Whisper models with faster-whisper
- **Audio Transcription**: Transcribe audio files with Whisper
- **Chunking Strategy**: Handle large audio files with chunking
#### Protocol-Based Services
- **Design service interfaces**
- **Implement version compatibility**
- **Create swappable components**
#### DeepSeek API Integration
- **Enhance transcript quality**
- **Implement structured outputs**
- **Handle API rate limits**
### 4. Testing Tools
**Tools**: 2 | **Skills**: 4
#### pytest with Real Files
- **Real File Testing**: Test with actual audio files instead of mocks
- **Test Fixtures**: Create reusable test fixtures with real files
- **Performance Testing**: Benchmark transcription performance
#### Coverage Reporting
- **Achieve >80% code coverage**
- **Identify untested code**
- **Track test quality**
### 5. Architecture Tools
**Tools**: 3 | **Skills**: 3
#### Iterative Pipeline Design
- **Version Management**: Manage different pipeline versions
- **Backward Compatibility**: Ensure new versions work with old data
- **Feature Flags**: Enable/disable features by version
#### Batch Processing System
- **Process multiple files**
- **Handle independent failures**
- **Track progress**
#### Caching Strategy
- **Cache expensive operations**
- **Implement different TTLs**
- **Handle cache invalidation**
### 6. Performance Tools
**Tools**: 2 | **Skills**: 3
#### Performance Profiling
- **Profile transcription speed**
- **Optimize memory usage**
- **Benchmark improvements**
#### M3 Hardware Optimization
- **Metal Performance Shaders**: Use M3 GPU for Whisper inference
- **Memory Optimization**: Optimize memory usage for large files
- **Performance Profiling**: Profile and optimize performance
### 7. Deployment Tools
**Tools**: 2 | **Skills**: 2
#### Docker Containerization
- **Create production images**
- **Handle dependencies**
- **Optimize image size**
#### CI/CD Pipeline
- **Automate testing**
- **Deploy to staging**
- **Monitor deployments**
## 📊 Agent Statistics
- **Total Tools Available**: 17
- **Required Skills**: 30
- **Categories**: 7
- **Development Phases**: 4 (v1, v2, v3, v4)
## 🎯 Phase-Specific Tool Availability
### Phase 1 (v1): Foundation
**Focus**: Basic Whisper transcription (95% accuracy, <30s for 5min audio)
**Tools**: Core Development, Database, Testing
### Phase 2 (v2): Enhancement
**Focus**: AI enhancement (99% accuracy, <35s processing)
**Tools**: + ML Integration
### Phase 3 (v3): Optimization
**Focus**: Multi-pass accuracy (99.5% accuracy, <25s processing)
**Tools**: + Performance
### Phase 4 (v4): Advanced Features
**Focus**: Speaker diarization (90% speaker accuracy)
**Tools**: + Deployment
## 🚀 Success Metrics
The agent must achieve these targets:
| Metric | Target |
|--------|--------|
| Processing Speed | 5-minute audio in <30 seconds |
| Accuracy | 99.5% transcription accuracy with multi-pass |
| Batch Capacity | Process 100+ files efficiently |
| Memory Usage | <4GB peak memory usage |
| Cost | <$0.01 per transcript |
| Code Coverage | >80% with real file testing |
| CLI Response | <1 second CLI response time |
| File Size | Handle files up to 500MB |
| Data Loss | Zero data loss on errors |
## 💻 Development Workflow
### 1. Environment Setup
```bash
uv venv
source .venv/bin/activate
uv pip install -e .[dev]
```
### 2. Database Setup
```bash
alembic revision -m 'Initial schema'
alembic upgrade head
```
### 3. Core Development
```python
class TranscriptionService(Protocol):
async def transcribe(self, audio: Path) -> Transcript: ...
```
### 4. ML Integration
```python
from faster_whisper import WhisperModel
model = WhisperModel('distil-large-v3', device='mps')
```
### 5. Testing
```bash
uv run pytest tests/
uv run pytest --cov=src
```
### 6. Performance Optimization
```python
model.transcribe(audio_path, chunk_length=30, overlap=2)
python -m cProfile src/main.py
```
## 🔧 Key Capabilities
### Protocol-Based Architecture
- Design clean service interfaces
- Implement dependency injection
- Create swappable components
- Maintain version compatibility
### Real File Testing
- Test with actual audio files
- No mocks in test suite
- Benchmark real performance
- Handle edge cases
### Performance Optimization
- M3 hardware acceleration
- Memory usage optimization
- Chunking for large files
- Profiling and benchmarking
### Batch Processing
- Handle 100+ files efficiently
- Independent failure handling
- Progress tracking
- Queue management
## 📁 File Structure
```
src/agents/
├── backend_developer_agent.py # Main agent definition
├── tools/
│ └── backend_developer_tools.py # Detailed tool definitions
└── demo_backend_developer.py # Demo script
```
## 🎮 Usage Examples
### Running the Demo
```bash
cd src/agents
python demo_backend_developer.py
```
### Checking Tool Availability
```python
from agents.backend_developer_agent import check_tool_availability
# Check if agent can use a specific tool
can_use_whisper = check_tool_availability("Whisper Integration")
print(f"Can use Whisper: {can_use_whisper}")
```
### Getting Tools by Category
```python
from agents.tools.backend_developer_tools import get_tools_by_category
# Get all database tools
db_tools = get_tools_by_category("database")
for tool in db_tools:
print(f"Database tool: {tool.name}")
```
### Getting Phase-Specific Tools
```python
from agents.tools.backend_developer_tools import get_tools_by_phase
# Get tools available in v1
v1_tools = get_tools_by_phase("v1")
for tool in v1_tools:
print(f"v1 tool: {tool.name}")
```
## 🎯 Next Steps
1. **Run the demo script** to see all capabilities
2. **Review the job posting** for hiring
3. **Set up development environment** for the agent
4. **Begin Phase 1 development** with core tools
5. **Implement protocol-based architecture** from day one
---
**The Backend Developer Agent is ready to build the future of media processing with clean, scalable, and reliable architecture!** 🚀