trax/docs/library-usage.md

519 lines
12 KiB
Markdown

# AI Assistant Library Integration Guide
## Overview
The Trax project leverages the **AI Assistant Class Library** - a comprehensive, production-tested library that provides common functionality for AI-powered applications. This guide explains how Trax uses the library and how to extend it for your needs.
## Library Components Used by Trax
### 1. Core Base Classes
#### BaseService
All Trax services extend `BaseService` for consistent service lifecycle management:
```python
from ai_assistant_lib import BaseService
class TraxService(BaseService):
async def _initialize_impl(self):
# Service-specific initialization
pass
```
**Benefits:**
- Standardized initialization/shutdown
- Health checking
- Status tracking
- Error counting
#### BaseRepository
Database operations use `BaseRepository` for CRUD operations:
```python
from ai_assistant_lib import BaseRepository, TimestampedRepository
class MediaFileRepository(TimestampedRepository):
# Inherits create, find_by_id, find_all, update, delete
# Plus automatic timestamp management
```
**Benefits:**
- Type-safe CRUD operations
- Automatic timestamp handling
- Built-in pagination
- Error handling
### 2. Retry and Resilience Patterns
#### RetryHandler
Automatic retry with exponential backoff:
```python
from ai_assistant_lib import async_retry, RetryConfig
@async_retry(max_attempts=3, backoff_factor=2.0)
async def transcribe_with_retry(audio_path):
return await transcribe(audio_path)
```
#### CircuitBreaker
Prevent cascading failures:
```python
from ai_assistant_lib import CircuitBreaker
breaker = CircuitBreaker(
failure_threshold=5,
recovery_timeout=60
)
async with breaker:
result = await risky_operation()
```
### 3. Caching Infrastructure
#### Multi-Layer Caching
```python
from ai_assistant_lib import MemoryCache, CacheManager, cached
# Memory cache for hot data
memory_cache = MemoryCache(default_ttl=3600)
# Decorator for automatic caching
@cached(ttl=7200)
async def expensive_operation(param):
return await compute_result(param)
```
**Cache Layers:**
1. **Memory Cache** - Fast, limited size
2. **Database Cache** - Persistent, searchable
3. **Filesystem Cache** - Large files
### 4. AI Service Integration
#### BaseAIService
Standardized AI service integration:
```python
from ai_assistant_lib import BaseAIService, AIModelConfig
class EnhancementService(BaseAIService):
def __init__(self):
config = AIModelConfig(
model_name="deepseek-chat",
temperature=0.0,
max_tokens=4096
)
super().__init__("EnhancementService", config)
```
**Features:**
- Unified API interface
- Automatic retry logic
- Cost tracking
- Model versioning
## Trax-Specific Extensions
### 1. Protocol-Based Services
Trax extends the library with protocol definitions for maximum flexibility:
```python
from typing import Protocol
class TranscriptionProtocol(Protocol):
async def transcribe(self, audio_path: Path) -> Dict[str, Any]:
...
def can_handle(self, audio_path: Path) -> bool:
...
```
### 2. Pipeline Versioning
Trax adds pipeline version tracking to services:
```python
class TraxService(BaseService):
def __init__(self, name, config=None):
super().__init__(name, config)
self.pipeline_version = config.get("pipeline_version", "v1")
```
### 3. JSONB Support
PostgreSQL JSONB columns for flexible data:
```python
from sqlalchemy.dialects.postgresql import JSONB
class Transcript(TimestampedModel):
raw_content = Column(JSONB, nullable=False)
enhanced_content = Column(JSONB)
```
## Usage Examples
### Example 1: Creating a New Service
```python
from ai_assistant_lib import BaseService, ServiceStatus
from src.base.services import TraxService
class WhisperService(TraxService):
"""Whisper transcription service."""
async def _initialize_impl(self):
"""Load Whisper model."""
self.model = await load_whisper_model()
logger.info(f"Loaded Whisper model")
async def transcribe(self, audio_path: Path):
"""Transcribe audio file."""
if self.status != ServiceStatus.HEALTHY:
raise ServiceUnavailableError("Service not ready")
return await self.model.transcribe(audio_path)
```
### Example 2: Repository with Caching
```python
from ai_assistant_lib import TimestampedRepository, cached
class TranscriptRepository(TimestampedRepository):
@cached(ttl=3600)
async def find_by_media_file(self, media_file_id):
"""Find transcript with caching."""
return self.session.query(Transcript).filter(
Transcript.media_file_id == media_file_id
).first()
```
### Example 3: Batch Processing with Circuit Breaker
```python
from ai_assistant_lib import AsyncProcessor, CircuitBreaker
class BatchProcessor(AsyncProcessor):
def __init__(self):
super().__init__("BatchProcessor")
self.breaker = CircuitBreaker(failure_threshold=5)
async def process_batch(self, files):
results = []
for file in files:
try:
async with self.breaker:
result = await self.process_file(file)
results.append(result)
except CircuitBreakerOpen:
logger.error("Circuit breaker open, stopping batch")
break
return results
```
## Configuration
### Library Configuration
Configure the library globally:
```python
from ai_assistant_lib import LibraryConfig
LibraryConfig.configure(
log_level="INFO",
default_timeout_seconds=30,
default_retry_attempts=3,
enable_metrics=True,
enable_tracing=False
)
```
### Service Configuration
Each service can have custom configuration:
```python
config = {
"pipeline_version": "v2",
"max_retries": 5,
"timeout": 60,
"cache_ttl": 7200
}
service = TranscriptionService(config=config)
```
## Testing with the Library
### Test Utilities
The library provides test utilities:
```python
from ai_assistant_lib.testing import AsyncTestCase, mock_service
class TestTranscription(AsyncTestCase):
async def setUp(self):
self.mock_ai = mock_service(BaseAIService)
self.service = TranscriptionService()
async def test_transcribe(self):
result = await self.service.transcribe(test_file)
self.assertIsNotNone(result)
```
### Mock Implementations
Create mock services for testing:
```python
class MockTranscriptionService(TranscriptionProtocol):
async def transcribe(self, audio_path):
return {"text": "Mock transcript", "duration": 10.0}
def can_handle(self, audio_path):
return True
```
## Performance Optimization
### 1. Connection Pooling
The library provides connection pooling:
```python
from ai_assistant_lib import ConnectionPool
pool = ConnectionPool(
max_connections=100,
min_connections=10,
timeout=30
)
```
### 2. Batch Operations
Optimize database operations:
```python
from ai_assistant_lib import bulk_insert, bulk_update
# Insert many records efficiently
await bulk_insert(session, records)
# Update many records in one query
await bulk_update(session, updates)
```
### 3. Async Patterns
Use async throughout:
```python
import asyncio
# Process multiple files concurrently
results = await asyncio.gather(*[
process_file(f) for f in files
])
```
## Error Handling
### Exception Hierarchy
The library provides a comprehensive exception hierarchy:
```python
from ai_assistant_lib import (
AIAssistantError, # Base exception
RetryableError, # Can be retried
NonRetryableError, # Should not retry
ServiceUnavailableError,
RateLimitError,
ValidationError
)
```
### Error Recovery
Built-in error recovery patterns:
```python
try:
result = await service.process()
except RetryableError as e:
# Will be automatically retried by decorator
logger.warning(f"Retryable error: {e}")
except NonRetryableError as e:
# Fatal error, don't retry
logger.error(f"Fatal error: {e}")
raise
```
## Monitoring and Metrics
### Health Checks
All services provide health status:
```python
health = service.get_health_status()
# {
# "status": "healthy",
# "is_healthy": true,
# "uptime_seconds": 3600,
# "error_count": 0
# }
```
### Performance Metrics
Track performance automatically:
```python
from ai_assistant_lib import MetricsCollector
metrics = MetricsCollector()
metrics.track("transcription_time", elapsed)
metrics.track("cache_hit_rate", hit_rate)
report = metrics.get_report()
```
## Migration from YouTube Summarizer
### Pattern Mapping
| YouTube Summarizer Pattern | AI Assistant Library Equivalent |
|---------------------------|----------------------------------|
| Custom retry logic | `@async_retry` decorator |
| Manual cache management | `CacheManager` class |
| Database operations | `BaseRepository` |
| Service initialization | `BaseService` |
| Error handling | Exception hierarchy |
### Code Migration Example
**Before (YouTube Summarizer):**
```python
class TranscriptService:
def __init__(self):
self.cache = {}
async def get_transcript(self, video_id):
if video_id in self.cache:
return self.cache[video_id]
# Retry logic
for attempt in range(3):
try:
result = await self.fetch_transcript(video_id)
self.cache[video_id] = result
return result
except Exception as e:
if attempt == 2:
raise
await asyncio.sleep(2 ** attempt)
```
**After (With Library):**
```python
from ai_assistant_lib import BaseService, cached, async_retry
class TranscriptService(BaseService):
@cached(ttl=3600)
@async_retry(max_attempts=3)
async def get_transcript(self, video_id):
return await self.fetch_transcript(video_id)
```
## Best Practices
### 1. Always Use Protocols
Define protocols for all services to enable easy swapping:
```python
class ProcessorProtocol(Protocol):
async def process(self, data: Any) -> Any: ...
```
### 2. Leverage Type Hints
Use type hints for better IDE support:
```python
async def process_batch(
self,
files: List[Path],
processor: ProcessorProtocol
) -> Dict[str, Any]:
...
```
### 3. Configuration Over Code
Use configuration files instead of hardcoding:
```python
config = load_config("config.yaml")
service = MyService(config=config)
```
### 4. Test with Real Data
Use the library's support for real file testing:
```python
test_file = Path("tests/fixtures/audio/sample.wav")
result = await service.transcribe(test_file)
```
## Troubleshooting
### Common Issues
1. **Import Errors**
- Ensure symlink is created: `ln -s ../../lib lib`
- Check Python path includes library
2. **Type Errors**
- Library requires Python 3.11+
- Use proper type hints
3. **Async Errors**
- Always use `async`/`await`
- Don't mix sync and async code
### Debug Mode
Enable debug logging:
```python
import logging
logging.getLogger("ai_assistant_lib").setLevel(logging.DEBUG)
```
## Summary
The AI Assistant Library provides Trax with:
**Production-tested components** - Used across multiple projects
**Consistent patterns** - Same patterns everywhere
**Built-in resilience** - Retry, circuit breaker, caching
**Type safety** - Full typing support
**Performance optimization** - Connection pooling, batch operations
**Comprehensive testing** - Test utilities and fixtures
By leveraging this library, Trax can focus on its unique media processing capabilities while relying on proven infrastructure components.
---
For more information about the library, see:
- [Library Source](../../lib/)
- [Library Tests](../../lib/tests/)
- [Usage Examples](../examples/)