trax/docs/library-usage.md

# AI Assistant Library Integration Guide

## Overview

The Trax project leverages the **AI Assistant Class Library** - a comprehensive, production-tested library that provides common functionality for AI-powered applications. This guide explains how Trax uses the library and how to extend it for your needs.

## Library Components Used by Trax

### 1. Core Base Classes

#### BaseService
All Trax services extend `BaseService` for consistent service lifecycle management:

```python
from ai_assistant_lib import BaseService

class TraxService(BaseService):
    async def _initialize_impl(self):
        # Service-specific initialization
        pass
```

**Benefits:**
- Standardized initialization/shutdown
- Health checking
- Status tracking
- Error counting

#### BaseRepository
Database operations use `BaseRepository` for CRUD operations:

```python
from ai_assistant_lib import BaseRepository, TimestampedRepository

class MediaFileRepository(TimestampedRepository):
    # Inherits create, find_by_id, find_all, update, delete
    # Plus automatic timestamp management
```

**Benefits:**
- Type-safe CRUD operations
- Automatic timestamp handling
- Built-in pagination
- Error handling

### 2. Retry and Resilience Patterns

#### RetryHandler
Automatic retry with exponential backoff:

```python
from ai_assistant_lib import async_retry, RetryConfig

@async_retry(max_attempts=3, backoff_factor=2.0)
async def transcribe_with_retry(audio_path):
    return await transcribe(audio_path)
```

#### CircuitBreaker
Prevent cascading failures:

```python
from ai_assistant_lib import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,
    recovery_timeout=60
)

async with breaker:
    result = await risky_operation()
```

### 3. Caching Infrastructure

#### Multi-Layer Caching
```python
from ai_assistant_lib import MemoryCache, CacheManager, cached

# Memory cache for hot data
memory_cache = MemoryCache(default_ttl=3600)

# Decorator for automatic caching
@cached(ttl=7200)
async def expensive_operation(param):
    return await compute_result(param)
```

**Cache Layers:**
1. **Memory Cache** - Fast, limited size
2. **Database Cache** - Persistent, searchable
3. **Filesystem Cache** - Large files

### 4. AI Service Integration

#### BaseAIService
Standardized AI service integration:

```python
from ai_assistant_lib import BaseAIService, AIModelConfig

class EnhancementService(BaseAIService):
    def __init__(self):
        config = AIModelConfig(
            model_name="deepseek-chat",
            temperature=0.0,
            max_tokens=4096
        )
        super().__init__("EnhancementService", config)
```

**Features:**
- Unified API interface
- Automatic retry logic
- Cost tracking
- Model versioning

## Trax-Specific Extensions

### 1. Protocol-Based Services

Trax extends the library with protocol definitions for maximum flexibility:

```python
from typing import Protocol

class TranscriptionProtocol(Protocol):
    async def transcribe(self, audio_path: Path) -> Dict[str, Any]:
        ...

    def can_handle(self, audio_path: Path) -> bool:
        ...
```

### 2. Pipeline Versioning

Trax adds pipeline version tracking to services:

```python
class TraxService(BaseService):
    def __init__(self, name, config=None):
        super().__init__(name, config)
        self.pipeline_version = config.get("pipeline_version", "v1")
```

### 3. JSONB Support

PostgreSQL JSONB columns for flexible data:

```python
from sqlalchemy.dialects.postgresql import JSONB

class Transcript(TimestampedModel):
    raw_content = Column(JSONB, nullable=False)
    enhanced_content = Column(JSONB)
```

## Usage Examples

### Example 1: Creating a New Service

```python
from ai_assistant_lib import BaseService, ServiceStatus
from src.base.services import TraxService

class WhisperService(TraxService):
    """Whisper transcription service."""

    async def _initialize_impl(self):
        """Load Whisper model."""
        self.model = await load_whisper_model()
        logger.info(f"Loaded Whisper model")

    async def transcribe(self, audio_path: Path):
        """Transcribe audio file."""
        if self.status != ServiceStatus.HEALTHY:
            raise ServiceUnavailableError("Service not ready")

        return await self.model.transcribe(audio_path)
```

### Example 2: Repository with Caching

```python
from ai_assistant_lib import TimestampedRepository, cached

class TranscriptRepository(TimestampedRepository):

    @cached(ttl=3600)
    async def find_by_media_file(self, media_file_id):
        """Find transcript with caching."""
        return self.session.query(Transcript).filter(
            Transcript.media_file_id == media_file_id
        ).first()
```

### Example 3: Batch Processing with Circuit Breaker

```python
from ai_assistant_lib import AsyncProcessor, CircuitBreaker

class BatchProcessor(AsyncProcessor):
    def __init__(self):
        super().__init__("BatchProcessor")
        self.breaker = CircuitBreaker(failure_threshold=5)

    async def process_batch(self, files):
        results = []
        for file in files:
            try:
                async with self.breaker:
                    result = await self.process_file(file)
                    results.append(result)
            except CircuitBreakerOpen:
                logger.error("Circuit breaker open, stopping batch")
                break
        return results
```

## Configuration

### Library Configuration

Configure the library globally:

```python
from ai_assistant_lib import LibraryConfig

LibraryConfig.configure(
    log_level="INFO",
    default_timeout_seconds=30,
    default_retry_attempts=3,
    enable_metrics=True,
    enable_tracing=False
)
```

### Service Configuration

Each service can have custom configuration:

```python
config = {
    "pipeline_version": "v2",
    "max_retries": 5,
    "timeout": 60,
    "cache_ttl": 7200
}

service = TranscriptionService(config=config)
```

## Testing with the Library

### Test Utilities

The library provides test utilities:

```python
from ai_assistant_lib.testing import AsyncTestCase, mock_service

class TestTranscription(AsyncTestCase):
    async def setUp(self):
        self.mock_ai = mock_service(BaseAIService)
        self.service = TranscriptionService()

    async def test_transcribe(self):
        result = await self.service.transcribe(test_file)
        self.assertIsNotNone(result)
```

### Mock Implementations

Create mock services for testing:

```python
class MockTranscriptionService(TranscriptionProtocol):
    async def transcribe(self, audio_path):
        return {"text": "Mock transcript", "duration": 10.0}

    def can_handle(self, audio_path):
        return True
```

## Performance Optimization

### 1. Connection Pooling

The library provides connection pooling:

```python
from ai_assistant_lib import ConnectionPool

pool = ConnectionPool(
    max_connections=100,
    min_connections=10,
    timeout=30
)
```

### 2. Batch Operations

Optimize database operations:

```python
from ai_assistant_lib import bulk_insert, bulk_update

# Insert many records efficiently
await bulk_insert(session, records)

# Update many records in one query
await bulk_update(session, updates)
```

### 3. Async Patterns

Use async throughout:

```python
import asyncio

# Process multiple files concurrently
results = await asyncio.gather(*[
    process_file(f) for f in files
])
```

## Error Handling

### Exception Hierarchy

The library provides a comprehensive exception hierarchy:

```python
from ai_assistant_lib import (
    AIAssistantError,      # Base exception
    RetryableError,        # Can be retried
    NonRetryableError,     # Should not retry
    ServiceUnavailableError,
    RateLimitError,
    ValidationError
)
```

### Error Recovery

Built-in error recovery patterns:

```python
try:
    result = await service.process()
except RetryableError as e:
    # Will be automatically retried by decorator
    logger.warning(f"Retryable error: {e}")
except NonRetryableError as e:
    # Fatal error, don't retry
    logger.error(f"Fatal error: {e}")
    raise
```

## Monitoring and Metrics

### Health Checks

All services provide health status:

```python
health = service.get_health_status()
# {
#     "status": "healthy",
#     "is_healthy": true,
#     "uptime_seconds": 3600,
#     "error_count": 0
# }
```

### Performance Metrics

Track performance automatically:

```python
from ai_assistant_lib import MetricsCollector

metrics = MetricsCollector()
metrics.track("transcription_time", elapsed)
metrics.track("cache_hit_rate", hit_rate)

report = metrics.get_report()
```

## Migration from YouTube Summarizer

### Pattern Mapping

| YouTube Summarizer Pattern | AI Assistant Library Equivalent |
|---------------------------|----------------------------------|
| Custom retry logic | `@async_retry` decorator |
| Manual cache management | `CacheManager` class |
| Database operations | `BaseRepository` |
| Service initialization | `BaseService` |
| Error handling | Exception hierarchy |

### Code Migration Example

**Before (YouTube Summarizer):**
```python
class TranscriptService:
    def __init__(self):
        self.cache = {}

    async def get_transcript(self, video_id):
        if video_id in self.cache:
            return self.cache[video_id]

        # Retry logic
        for attempt in range(3):
            try:
                result = await self.fetch_transcript(video_id)
                self.cache[video_id] = result
                return result
            except Exception as e:
                if attempt == 2:
                    raise
                await asyncio.sleep(2 ** attempt)
```

**After (With Library):**
```python
from ai_assistant_lib import BaseService, cached, async_retry

class TranscriptService(BaseService):
    @cached(ttl=3600)
    @async_retry(max_attempts=3)
    async def get_transcript(self, video_id):
        return await self.fetch_transcript(video_id)
```

## Best Practices

### 1. Always Use Protocols
Define protocols for all services to enable easy swapping:

```python
class ProcessorProtocol(Protocol):
    async def process(self, data: Any) -> Any: ...
```

### 2. Leverage Type Hints
Use type hints for better IDE support:

```python
async def process_batch(
    self,
    files: List[Path],
    processor: ProcessorProtocol
) -> Dict[str, Any]:
    ...
```

### 3. Configuration Over Code
Use configuration files instead of hardcoding:

```python
config = load_config("config.yaml")
service = MyService(config=config)
```

### 4. Test with Real Data
Use the library's support for real file testing:

```python
test_file = Path("tests/fixtures/audio/sample.wav")
result = await service.transcribe(test_file)
```

## Troubleshooting

### Common Issues

1. **Import Errors**
   - Ensure symlink is created: `ln -s ../../lib lib`
   - Check Python path includes library

2. **Type Errors**
   - Library requires Python 3.11+
   - Use proper type hints

3. **Async Errors**
   - Always use `async`/`await`
   - Don't mix sync and async code

### Debug Mode

Enable debug logging:

```python
import logging
logging.getLogger("ai_assistant_lib").setLevel(logging.DEBUG)
```

## Summary

The AI Assistant Library provides Trax with:

✅ **Production-tested components** - Used across multiple projects
✅ **Consistent patterns** - Same patterns everywhere
✅ **Built-in resilience** - Retry, circuit breaker, caching
✅ **Type safety** - Full typing support
✅ **Performance optimization** - Connection pooling, batch operations
✅ **Comprehensive testing** - Test utilities and fixtures

By leveraging this library, Trax can focus on its unique media processing capabilities while relying on proven infrastructure components.

---

For more information about the library, see:
- [Library Source](../../lib/)
- [Library Tests](../../lib/tests/)
- [Usage Examples](../examples/)