trax/docs/library-usage.md

12 KiB

AI Assistant Library Integration Guide

Overview

The Trax project leverages the AI Assistant Class Library - a comprehensive, production-tested library that provides common functionality for AI-powered applications. This guide explains how Trax uses the library and how to extend it for your needs.

Library Components Used by Trax

1. Core Base Classes

BaseService

All Trax services extend BaseService for consistent service lifecycle management:

from ai_assistant_lib import BaseService

class TraxService(BaseService):
    async def _initialize_impl(self):
        # Service-specific initialization
        pass

Benefits:

  • Standardized initialization/shutdown
  • Health checking
  • Status tracking
  • Error counting

BaseRepository

Database operations use BaseRepository for CRUD operations:

from ai_assistant_lib import BaseRepository, TimestampedRepository

class MediaFileRepository(TimestampedRepository):
    # Inherits create, find_by_id, find_all, update, delete
    # Plus automatic timestamp management

Benefits:

  • Type-safe CRUD operations
  • Automatic timestamp handling
  • Built-in pagination
  • Error handling

2. Retry and Resilience Patterns

RetryHandler

Automatic retry with exponential backoff:

from ai_assistant_lib import async_retry, RetryConfig

@async_retry(max_attempts=3, backoff_factor=2.0)
async def transcribe_with_retry(audio_path):
    return await transcribe(audio_path)

CircuitBreaker

Prevent cascading failures:

from ai_assistant_lib import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,
    recovery_timeout=60
)

async with breaker:
    result = await risky_operation()

3. Caching Infrastructure

Multi-Layer Caching

from ai_assistant_lib import MemoryCache, CacheManager, cached

# Memory cache for hot data
memory_cache = MemoryCache(default_ttl=3600)

# Decorator for automatic caching
@cached(ttl=7200)
async def expensive_operation(param):
    return await compute_result(param)

Cache Layers:

  1. Memory Cache - Fast, limited size
  2. Database Cache - Persistent, searchable
  3. Filesystem Cache - Large files

4. AI Service Integration

BaseAIService

Standardized AI service integration:

from ai_assistant_lib import BaseAIService, AIModelConfig

class EnhancementService(BaseAIService):
    def __init__(self):
        config = AIModelConfig(
            model_name="deepseek-chat",
            temperature=0.0,
            max_tokens=4096
        )
        super().__init__("EnhancementService", config)

Features:

  • Unified API interface
  • Automatic retry logic
  • Cost tracking
  • Model versioning

Trax-Specific Extensions

1. Protocol-Based Services

Trax extends the library with protocol definitions for maximum flexibility:

from typing import Protocol

class TranscriptionProtocol(Protocol):
    async def transcribe(self, audio_path: Path) -> Dict[str, Any]:
        ...
    
    def can_handle(self, audio_path: Path) -> bool:
        ...

2. Pipeline Versioning

Trax adds pipeline version tracking to services:

class TraxService(BaseService):
    def __init__(self, name, config=None):
        super().__init__(name, config)
        self.pipeline_version = config.get("pipeline_version", "v1")

3. JSONB Support

PostgreSQL JSONB columns for flexible data:

from sqlalchemy.dialects.postgresql import JSONB

class Transcript(TimestampedModel):
    raw_content = Column(JSONB, nullable=False)
    enhanced_content = Column(JSONB)

Usage Examples

Example 1: Creating a New Service

from ai_assistant_lib import BaseService, ServiceStatus
from src.base.services import TraxService

class WhisperService(TraxService):
    """Whisper transcription service."""
    
    async def _initialize_impl(self):
        """Load Whisper model."""
        self.model = await load_whisper_model()
        logger.info(f"Loaded Whisper model")
    
    async def transcribe(self, audio_path: Path):
        """Transcribe audio file."""
        if self.status != ServiceStatus.HEALTHY:
            raise ServiceUnavailableError("Service not ready")
        
        return await self.model.transcribe(audio_path)

Example 2: Repository with Caching

from ai_assistant_lib import TimestampedRepository, cached

class TranscriptRepository(TimestampedRepository):
    
    @cached(ttl=3600)
    async def find_by_media_file(self, media_file_id):
        """Find transcript with caching."""
        return self.session.query(Transcript).filter(
            Transcript.media_file_id == media_file_id
        ).first()

Example 3: Batch Processing with Circuit Breaker

from ai_assistant_lib import AsyncProcessor, CircuitBreaker

class BatchProcessor(AsyncProcessor):
    def __init__(self):
        super().__init__("BatchProcessor")
        self.breaker = CircuitBreaker(failure_threshold=5)
    
    async def process_batch(self, files):
        results = []
        for file in files:
            try:
                async with self.breaker:
                    result = await self.process_file(file)
                    results.append(result)
            except CircuitBreakerOpen:
                logger.error("Circuit breaker open, stopping batch")
                break
        return results

Configuration

Library Configuration

Configure the library globally:

from ai_assistant_lib import LibraryConfig

LibraryConfig.configure(
    log_level="INFO",
    default_timeout_seconds=30,
    default_retry_attempts=3,
    enable_metrics=True,
    enable_tracing=False
)

Service Configuration

Each service can have custom configuration:

config = {
    "pipeline_version": "v2",
    "max_retries": 5,
    "timeout": 60,
    "cache_ttl": 7200
}

service = TranscriptionService(config=config)

Testing with the Library

Test Utilities

The library provides test utilities:

from ai_assistant_lib.testing import AsyncTestCase, mock_service

class TestTranscription(AsyncTestCase):
    async def setUp(self):
        self.mock_ai = mock_service(BaseAIService)
        self.service = TranscriptionService()
    
    async def test_transcribe(self):
        result = await self.service.transcribe(test_file)
        self.assertIsNotNone(result)

Mock Implementations

Create mock services for testing:

class MockTranscriptionService(TranscriptionProtocol):
    async def transcribe(self, audio_path):
        return {"text": "Mock transcript", "duration": 10.0}
    
    def can_handle(self, audio_path):
        return True

Performance Optimization

1. Connection Pooling

The library provides connection pooling:

from ai_assistant_lib import ConnectionPool

pool = ConnectionPool(
    max_connections=100,
    min_connections=10,
    timeout=30
)

2. Batch Operations

Optimize database operations:

from ai_assistant_lib import bulk_insert, bulk_update

# Insert many records efficiently
await bulk_insert(session, records)

# Update many records in one query
await bulk_update(session, updates)

3. Async Patterns

Use async throughout:

import asyncio

# Process multiple files concurrently
results = await asyncio.gather(*[
    process_file(f) for f in files
])

Error Handling

Exception Hierarchy

The library provides a comprehensive exception hierarchy:

from ai_assistant_lib import (
    AIAssistantError,      # Base exception
    RetryableError,        # Can be retried
    NonRetryableError,     # Should not retry
    ServiceUnavailableError,
    RateLimitError,
    ValidationError
)

Error Recovery

Built-in error recovery patterns:

try:
    result = await service.process()
except RetryableError as e:
    # Will be automatically retried by decorator
    logger.warning(f"Retryable error: {e}")
except NonRetryableError as e:
    # Fatal error, don't retry
    logger.error(f"Fatal error: {e}")
    raise

Monitoring and Metrics

Health Checks

All services provide health status:

health = service.get_health_status()
# {
#     "status": "healthy",
#     "is_healthy": true,
#     "uptime_seconds": 3600,
#     "error_count": 0
# }

Performance Metrics

Track performance automatically:

from ai_assistant_lib import MetricsCollector

metrics = MetricsCollector()
metrics.track("transcription_time", elapsed)
metrics.track("cache_hit_rate", hit_rate)

report = metrics.get_report()

Migration from YouTube Summarizer

Pattern Mapping

YouTube Summarizer Pattern AI Assistant Library Equivalent
Custom retry logic @async_retry decorator
Manual cache management CacheManager class
Database operations BaseRepository
Service initialization BaseService
Error handling Exception hierarchy

Code Migration Example

Before (YouTube Summarizer):

class TranscriptService:
    def __init__(self):
        self.cache = {}
    
    async def get_transcript(self, video_id):
        if video_id in self.cache:
            return self.cache[video_id]
        
        # Retry logic
        for attempt in range(3):
            try:
                result = await self.fetch_transcript(video_id)
                self.cache[video_id] = result
                return result
            except Exception as e:
                if attempt == 2:
                    raise
                await asyncio.sleep(2 ** attempt)

After (With Library):

from ai_assistant_lib import BaseService, cached, async_retry

class TranscriptService(BaseService):
    @cached(ttl=3600)
    @async_retry(max_attempts=3)
    async def get_transcript(self, video_id):
        return await self.fetch_transcript(video_id)

Best Practices

1. Always Use Protocols

Define protocols for all services to enable easy swapping:

class ProcessorProtocol(Protocol):
    async def process(self, data: Any) -> Any: ...

2. Leverage Type Hints

Use type hints for better IDE support:

async def process_batch(
    self,
    files: List[Path],
    processor: ProcessorProtocol
) -> Dict[str, Any]:
    ...

3. Configuration Over Code

Use configuration files instead of hardcoding:

config = load_config("config.yaml")
service = MyService(config=config)

4. Test with Real Data

Use the library's support for real file testing:

test_file = Path("tests/fixtures/audio/sample.wav")
result = await service.transcribe(test_file)

Troubleshooting

Common Issues

  1. Import Errors

    • Ensure symlink is created: ln -s ../../lib lib
    • Check Python path includes library
  2. Type Errors

    • Library requires Python 3.11+
    • Use proper type hints
  3. Async Errors

    • Always use async/await
    • Don't mix sync and async code

Debug Mode

Enable debug logging:

import logging
logging.getLogger("ai_assistant_lib").setLevel(logging.DEBUG)

Summary

The AI Assistant Library provides Trax with:

Production-tested components - Used across multiple projects
Consistent patterns - Same patterns everywhere
Built-in resilience - Retry, circuit breaker, caching
Type safety - Full typing support
Performance optimization - Connection pooling, batch operations
Comprehensive testing - Test utilities and fixtures

By leveraging this library, Trax can focus on its unique media processing capabilities while relying on proven infrastructure components.


For more information about the library, see: