trax/BACKEND_DEVELOPER_AGENT_SUM...

7.6 KiB

Backend Developer Agent - Capabilities & Tools

🎯 Agent Overview

The Backend Python Developer Agent is a comprehensive representation of the first backend developer hire for the Trax media processing platform. This agent has access to specific tools and capabilities needed to build the protocol-based transcription pipeline from v1 to v4.

Agent Profile

  • Name: Backend Python Developer
  • Role: Senior Backend Developer
  • Experience Level: Senior
  • Salary Range: $150,000 - $200,000
  • Current Focus: Phase 1: Foundation (Weeks 1-2)

🛠️ Available Tools by Category

1. Core Development Tools

Tools: 3 | Skills: 8

Python 3.11+ Development

  • Async Programming: Write async/await code for concurrent operations
  • Protocol Design: Create protocol-based service interfaces
  • Type Hints: Use comprehensive type hints throughout

uv Package Manager

  • Install Dependencies: Install project dependencies
  • Compile Requirements: Generate requirements.txt from pyproject.toml
  • Run Commands: Execute Python commands with uv

Click CLI Framework

  • Create transcription commands
  • Build batch processing interface
  • Implement export functionality

2. Database Tools

Tools: 2 | Skills: 4

PostgreSQL + SQLAlchemy

  • Model Definition: Define SQLAlchemy models with JSONB
  • Database Migrations: Create and apply Alembic migrations
  • JSONB Operations: Perform JSONB queries and operations

Database Registry Pattern

  • Implement centralized model registry
  • Handle multiple database connections
  • Manage model relationships

3. ML Integration Tools

Tools: 3 | Skills: 6

Whisper Integration

  • Model Loading: Load Whisper models with faster-whisper
  • Audio Transcription: Transcribe audio files with Whisper
  • Chunking Strategy: Handle large audio files with chunking

Protocol-Based Services

  • Design service interfaces
  • Implement version compatibility
  • Create swappable components

DeepSeek API Integration

  • Enhance transcript quality
  • Implement structured outputs
  • Handle API rate limits

4. Testing Tools

Tools: 2 | Skills: 4

pytest with Real Files

  • Real File Testing: Test with actual audio files instead of mocks
  • Test Fixtures: Create reusable test fixtures with real files
  • Performance Testing: Benchmark transcription performance

Coverage Reporting

  • Achieve >80% code coverage
  • Identify untested code
  • Track test quality

5. Architecture Tools

Tools: 3 | Skills: 3

Iterative Pipeline Design

  • Version Management: Manage different pipeline versions
  • Backward Compatibility: Ensure new versions work with old data
  • Feature Flags: Enable/disable features by version

Batch Processing System

  • Process multiple files
  • Handle independent failures
  • Track progress

Caching Strategy

  • Cache expensive operations
  • Implement different TTLs
  • Handle cache invalidation

6. Performance Tools

Tools: 2 | Skills: 3

Performance Profiling

  • Profile transcription speed
  • Optimize memory usage
  • Benchmark improvements

M3 Hardware Optimization

  • Metal Performance Shaders: Use M3 GPU for Whisper inference
  • Memory Optimization: Optimize memory usage for large files
  • Performance Profiling: Profile and optimize performance

7. Deployment Tools

Tools: 2 | Skills: 2

Docker Containerization

  • Create production images
  • Handle dependencies
  • Optimize image size

CI/CD Pipeline

  • Automate testing
  • Deploy to staging
  • Monitor deployments

📊 Agent Statistics

  • Total Tools Available: 17
  • Required Skills: 30
  • Categories: 7
  • Development Phases: 4 (v1, v2, v3, v4)

🎯 Phase-Specific Tool Availability

Phase 1 (v1): Foundation

Focus: Basic Whisper transcription (95% accuracy, <30s for 5min audio) Tools: Core Development, Database, Testing

Phase 2 (v2): Enhancement

Focus: AI enhancement (99% accuracy, <35s processing) Tools: + ML Integration

Phase 3 (v3): Optimization

Focus: Multi-pass accuracy (99.5% accuracy, <25s processing) Tools: + Performance

Phase 4 (v4): Advanced Features

Focus: Speaker diarization (90% speaker accuracy) Tools: + Deployment

🚀 Success Metrics

The agent must achieve these targets:

Metric Target
Processing Speed 5-minute audio in <30 seconds
Accuracy 99.5% transcription accuracy with multi-pass
Batch Capacity Process 100+ files efficiently
Memory Usage <4GB peak memory usage
Cost <$0.01 per transcript
Code Coverage >80% with real file testing
CLI Response <1 second CLI response time
File Size Handle files up to 500MB
Data Loss Zero data loss on errors

💻 Development Workflow

1. Environment Setup

uv venv
source .venv/bin/activate
uv pip install -e .[dev]

2. Database Setup

alembic revision -m 'Initial schema'
alembic upgrade head

3. Core Development

class TranscriptionService(Protocol):
    async def transcribe(self, audio: Path) -> Transcript: ...

4. ML Integration

from faster_whisper import WhisperModel
model = WhisperModel('distil-large-v3', device='mps')

5. Testing

uv run pytest tests/
uv run pytest --cov=src

6. Performance Optimization

model.transcribe(audio_path, chunk_length=30, overlap=2)
python -m cProfile src/main.py

🔧 Key Capabilities

Protocol-Based Architecture

  • Design clean service interfaces
  • Implement dependency injection
  • Create swappable components
  • Maintain version compatibility

Real File Testing

  • Test with actual audio files
  • No mocks in test suite
  • Benchmark real performance
  • Handle edge cases

Performance Optimization

  • M3 hardware acceleration
  • Memory usage optimization
  • Chunking for large files
  • Profiling and benchmarking

Batch Processing

  • Handle 100+ files efficiently
  • Independent failure handling
  • Progress tracking
  • Queue management

📁 File Structure

src/agents/
├── backend_developer_agent.py          # Main agent definition
├── tools/
│   └── backend_developer_tools.py      # Detailed tool definitions
└── demo_backend_developer.py           # Demo script

🎮 Usage Examples

Running the Demo

cd src/agents
python demo_backend_developer.py

Checking Tool Availability

from agents.backend_developer_agent import check_tool_availability

# Check if agent can use a specific tool
can_use_whisper = check_tool_availability("Whisper Integration")
print(f"Can use Whisper: {can_use_whisper}")

Getting Tools by Category

from agents.tools.backend_developer_tools import get_tools_by_category

# Get all database tools
db_tools = get_tools_by_category("database")
for tool in db_tools:
    print(f"Database tool: {tool.name}")

Getting Phase-Specific Tools

from agents.tools.backend_developer_tools import get_tools_by_phase

# Get tools available in v1
v1_tools = get_tools_by_phase("v1")
for tool in v1_tools:
    print(f"v1 tool: {tool.name}")

🎯 Next Steps

  1. Run the demo script to see all capabilities
  2. Review the job posting for hiring
  3. Set up development environment for the agent
  4. Begin Phase 1 development with core tools
  5. Implement protocol-based architecture from day one

The Backend Developer Agent is ready to build the future of media processing with clean, scalable, and reliable architecture! 🚀