519 lines
12 KiB
Markdown
519 lines
12 KiB
Markdown
# AI Assistant Library Integration Guide
|
|
|
|
## Overview
|
|
|
|
The Trax project leverages the **AI Assistant Class Library** - a comprehensive, production-tested library that provides common functionality for AI-powered applications. This guide explains how Trax uses the library and how to extend it for your needs.
|
|
|
|
## Library Components Used by Trax
|
|
|
|
### 1. Core Base Classes
|
|
|
|
#### BaseService
|
|
All Trax services extend `BaseService` for consistent service lifecycle management:
|
|
|
|
```python
|
|
from ai_assistant_lib import BaseService
|
|
|
|
class TraxService(BaseService):
|
|
async def _initialize_impl(self):
|
|
# Service-specific initialization
|
|
pass
|
|
```
|
|
|
|
**Benefits:**
|
|
- Standardized initialization/shutdown
|
|
- Health checking
|
|
- Status tracking
|
|
- Error counting
|
|
|
|
#### BaseRepository
|
|
Database operations use `BaseRepository` for CRUD operations:
|
|
|
|
```python
|
|
from ai_assistant_lib import BaseRepository, TimestampedRepository
|
|
|
|
class MediaFileRepository(TimestampedRepository):
|
|
# Inherits create, find_by_id, find_all, update, delete
|
|
# Plus automatic timestamp management
|
|
```
|
|
|
|
**Benefits:**
|
|
- Type-safe CRUD operations
|
|
- Automatic timestamp handling
|
|
- Built-in pagination
|
|
- Error handling
|
|
|
|
### 2. Retry and Resilience Patterns
|
|
|
|
#### RetryHandler
|
|
Automatic retry with exponential backoff:
|
|
|
|
```python
|
|
from ai_assistant_lib import async_retry, RetryConfig
|
|
|
|
@async_retry(max_attempts=3, backoff_factor=2.0)
|
|
async def transcribe_with_retry(audio_path):
|
|
return await transcribe(audio_path)
|
|
```
|
|
|
|
#### CircuitBreaker
|
|
Prevent cascading failures:
|
|
|
|
```python
|
|
from ai_assistant_lib import CircuitBreaker
|
|
|
|
breaker = CircuitBreaker(
|
|
failure_threshold=5,
|
|
recovery_timeout=60
|
|
)
|
|
|
|
async with breaker:
|
|
result = await risky_operation()
|
|
```
|
|
|
|
### 3. Caching Infrastructure
|
|
|
|
#### Multi-Layer Caching
|
|
```python
|
|
from ai_assistant_lib import MemoryCache, CacheManager, cached
|
|
|
|
# Memory cache for hot data
|
|
memory_cache = MemoryCache(default_ttl=3600)
|
|
|
|
# Decorator for automatic caching
|
|
@cached(ttl=7200)
|
|
async def expensive_operation(param):
|
|
return await compute_result(param)
|
|
```
|
|
|
|
**Cache Layers:**
|
|
1. **Memory Cache** - Fast, limited size
|
|
2. **Database Cache** - Persistent, searchable
|
|
3. **Filesystem Cache** - Large files
|
|
|
|
### 4. AI Service Integration
|
|
|
|
#### BaseAIService
|
|
Standardized AI service integration:
|
|
|
|
```python
|
|
from ai_assistant_lib import BaseAIService, AIModelConfig
|
|
|
|
class EnhancementService(BaseAIService):
|
|
def __init__(self):
|
|
config = AIModelConfig(
|
|
model_name="deepseek-chat",
|
|
temperature=0.0,
|
|
max_tokens=4096
|
|
)
|
|
super().__init__("EnhancementService", config)
|
|
```
|
|
|
|
**Features:**
|
|
- Unified API interface
|
|
- Automatic retry logic
|
|
- Cost tracking
|
|
- Model versioning
|
|
|
|
## Trax-Specific Extensions
|
|
|
|
### 1. Protocol-Based Services
|
|
|
|
Trax extends the library with protocol definitions for maximum flexibility:
|
|
|
|
```python
|
|
from typing import Protocol
|
|
|
|
class TranscriptionProtocol(Protocol):
|
|
async def transcribe(self, audio_path: Path) -> Dict[str, Any]:
|
|
...
|
|
|
|
def can_handle(self, audio_path: Path) -> bool:
|
|
...
|
|
```
|
|
|
|
### 2. Pipeline Versioning
|
|
|
|
Trax adds pipeline version tracking to services:
|
|
|
|
```python
|
|
class TraxService(BaseService):
|
|
def __init__(self, name, config=None):
|
|
super().__init__(name, config)
|
|
self.pipeline_version = config.get("pipeline_version", "v1")
|
|
```
|
|
|
|
### 3. JSONB Support
|
|
|
|
PostgreSQL JSONB columns for flexible data:
|
|
|
|
```python
|
|
from sqlalchemy.dialects.postgresql import JSONB
|
|
|
|
class Transcript(TimestampedModel):
|
|
raw_content = Column(JSONB, nullable=False)
|
|
enhanced_content = Column(JSONB)
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Example 1: Creating a New Service
|
|
|
|
```python
|
|
from ai_assistant_lib import BaseService, ServiceStatus
|
|
from src.base.services import TraxService
|
|
|
|
class WhisperService(TraxService):
|
|
"""Whisper transcription service."""
|
|
|
|
async def _initialize_impl(self):
|
|
"""Load Whisper model."""
|
|
self.model = await load_whisper_model()
|
|
logger.info(f"Loaded Whisper model")
|
|
|
|
async def transcribe(self, audio_path: Path):
|
|
"""Transcribe audio file."""
|
|
if self.status != ServiceStatus.HEALTHY:
|
|
raise ServiceUnavailableError("Service not ready")
|
|
|
|
return await self.model.transcribe(audio_path)
|
|
```
|
|
|
|
### Example 2: Repository with Caching
|
|
|
|
```python
|
|
from ai_assistant_lib import TimestampedRepository, cached
|
|
|
|
class TranscriptRepository(TimestampedRepository):
|
|
|
|
@cached(ttl=3600)
|
|
async def find_by_media_file(self, media_file_id):
|
|
"""Find transcript with caching."""
|
|
return self.session.query(Transcript).filter(
|
|
Transcript.media_file_id == media_file_id
|
|
).first()
|
|
```
|
|
|
|
### Example 3: Batch Processing with Circuit Breaker
|
|
|
|
```python
|
|
from ai_assistant_lib import AsyncProcessor, CircuitBreaker
|
|
|
|
class BatchProcessor(AsyncProcessor):
|
|
def __init__(self):
|
|
super().__init__("BatchProcessor")
|
|
self.breaker = CircuitBreaker(failure_threshold=5)
|
|
|
|
async def process_batch(self, files):
|
|
results = []
|
|
for file in files:
|
|
try:
|
|
async with self.breaker:
|
|
result = await self.process_file(file)
|
|
results.append(result)
|
|
except CircuitBreakerOpen:
|
|
logger.error("Circuit breaker open, stopping batch")
|
|
break
|
|
return results
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Library Configuration
|
|
|
|
Configure the library globally:
|
|
|
|
```python
|
|
from ai_assistant_lib import LibraryConfig
|
|
|
|
LibraryConfig.configure(
|
|
log_level="INFO",
|
|
default_timeout_seconds=30,
|
|
default_retry_attempts=3,
|
|
enable_metrics=True,
|
|
enable_tracing=False
|
|
)
|
|
```
|
|
|
|
### Service Configuration
|
|
|
|
Each service can have custom configuration:
|
|
|
|
```python
|
|
config = {
|
|
"pipeline_version": "v2",
|
|
"max_retries": 5,
|
|
"timeout": 60,
|
|
"cache_ttl": 7200
|
|
}
|
|
|
|
service = TranscriptionService(config=config)
|
|
```
|
|
|
|
## Testing with the Library
|
|
|
|
### Test Utilities
|
|
|
|
The library provides test utilities:
|
|
|
|
```python
|
|
from ai_assistant_lib.testing import AsyncTestCase, mock_service
|
|
|
|
class TestTranscription(AsyncTestCase):
|
|
async def setUp(self):
|
|
self.mock_ai = mock_service(BaseAIService)
|
|
self.service = TranscriptionService()
|
|
|
|
async def test_transcribe(self):
|
|
result = await self.service.transcribe(test_file)
|
|
self.assertIsNotNone(result)
|
|
```
|
|
|
|
### Mock Implementations
|
|
|
|
Create mock services for testing:
|
|
|
|
```python
|
|
class MockTranscriptionService(TranscriptionProtocol):
|
|
async def transcribe(self, audio_path):
|
|
return {"text": "Mock transcript", "duration": 10.0}
|
|
|
|
def can_handle(self, audio_path):
|
|
return True
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
### 1. Connection Pooling
|
|
|
|
The library provides connection pooling:
|
|
|
|
```python
|
|
from ai_assistant_lib import ConnectionPool
|
|
|
|
pool = ConnectionPool(
|
|
max_connections=100,
|
|
min_connections=10,
|
|
timeout=30
|
|
)
|
|
```
|
|
|
|
### 2. Batch Operations
|
|
|
|
Optimize database operations:
|
|
|
|
```python
|
|
from ai_assistant_lib import bulk_insert, bulk_update
|
|
|
|
# Insert many records efficiently
|
|
await bulk_insert(session, records)
|
|
|
|
# Update many records in one query
|
|
await bulk_update(session, updates)
|
|
```
|
|
|
|
### 3. Async Patterns
|
|
|
|
Use async throughout:
|
|
|
|
```python
|
|
import asyncio
|
|
|
|
# Process multiple files concurrently
|
|
results = await asyncio.gather(*[
|
|
process_file(f) for f in files
|
|
])
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Exception Hierarchy
|
|
|
|
The library provides a comprehensive exception hierarchy:
|
|
|
|
```python
|
|
from ai_assistant_lib import (
|
|
AIAssistantError, # Base exception
|
|
RetryableError, # Can be retried
|
|
NonRetryableError, # Should not retry
|
|
ServiceUnavailableError,
|
|
RateLimitError,
|
|
ValidationError
|
|
)
|
|
```
|
|
|
|
### Error Recovery
|
|
|
|
Built-in error recovery patterns:
|
|
|
|
```python
|
|
try:
|
|
result = await service.process()
|
|
except RetryableError as e:
|
|
# Will be automatically retried by decorator
|
|
logger.warning(f"Retryable error: {e}")
|
|
except NonRetryableError as e:
|
|
# Fatal error, don't retry
|
|
logger.error(f"Fatal error: {e}")
|
|
raise
|
|
```
|
|
|
|
## Monitoring and Metrics
|
|
|
|
### Health Checks
|
|
|
|
All services provide health status:
|
|
|
|
```python
|
|
health = service.get_health_status()
|
|
# {
|
|
# "status": "healthy",
|
|
# "is_healthy": true,
|
|
# "uptime_seconds": 3600,
|
|
# "error_count": 0
|
|
# }
|
|
```
|
|
|
|
### Performance Metrics
|
|
|
|
Track performance automatically:
|
|
|
|
```python
|
|
from ai_assistant_lib import MetricsCollector
|
|
|
|
metrics = MetricsCollector()
|
|
metrics.track("transcription_time", elapsed)
|
|
metrics.track("cache_hit_rate", hit_rate)
|
|
|
|
report = metrics.get_report()
|
|
```
|
|
|
|
## Migration from YouTube Summarizer
|
|
|
|
### Pattern Mapping
|
|
|
|
| YouTube Summarizer Pattern | AI Assistant Library Equivalent |
|
|
|---------------------------|----------------------------------|
|
|
| Custom retry logic | `@async_retry` decorator |
|
|
| Manual cache management | `CacheManager` class |
|
|
| Database operations | `BaseRepository` |
|
|
| Service initialization | `BaseService` |
|
|
| Error handling | Exception hierarchy |
|
|
|
|
### Code Migration Example
|
|
|
|
**Before (YouTube Summarizer):**
|
|
```python
|
|
class TranscriptService:
|
|
def __init__(self):
|
|
self.cache = {}
|
|
|
|
async def get_transcript(self, video_id):
|
|
if video_id in self.cache:
|
|
return self.cache[video_id]
|
|
|
|
# Retry logic
|
|
for attempt in range(3):
|
|
try:
|
|
result = await self.fetch_transcript(video_id)
|
|
self.cache[video_id] = result
|
|
return result
|
|
except Exception as e:
|
|
if attempt == 2:
|
|
raise
|
|
await asyncio.sleep(2 ** attempt)
|
|
```
|
|
|
|
**After (With Library):**
|
|
```python
|
|
from ai_assistant_lib import BaseService, cached, async_retry
|
|
|
|
class TranscriptService(BaseService):
|
|
@cached(ttl=3600)
|
|
@async_retry(max_attempts=3)
|
|
async def get_transcript(self, video_id):
|
|
return await self.fetch_transcript(video_id)
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Always Use Protocols
|
|
Define protocols for all services to enable easy swapping:
|
|
|
|
```python
|
|
class ProcessorProtocol(Protocol):
|
|
async def process(self, data: Any) -> Any: ...
|
|
```
|
|
|
|
### 2. Leverage Type Hints
|
|
Use type hints for better IDE support:
|
|
|
|
```python
|
|
async def process_batch(
|
|
self,
|
|
files: List[Path],
|
|
processor: ProcessorProtocol
|
|
) -> Dict[str, Any]:
|
|
...
|
|
```
|
|
|
|
### 3. Configuration Over Code
|
|
Use configuration files instead of hardcoding:
|
|
|
|
```python
|
|
config = load_config("config.yaml")
|
|
service = MyService(config=config)
|
|
```
|
|
|
|
### 4. Test with Real Data
|
|
Use the library's support for real file testing:
|
|
|
|
```python
|
|
test_file = Path("tests/fixtures/audio/sample.wav")
|
|
result = await service.transcribe(test_file)
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Import Errors**
|
|
- Ensure symlink is created: `ln -s ../../lib lib`
|
|
- Check Python path includes library
|
|
|
|
2. **Type Errors**
|
|
- Library requires Python 3.11+
|
|
- Use proper type hints
|
|
|
|
3. **Async Errors**
|
|
- Always use `async`/`await`
|
|
- Don't mix sync and async code
|
|
|
|
### Debug Mode
|
|
|
|
Enable debug logging:
|
|
|
|
```python
|
|
import logging
|
|
logging.getLogger("ai_assistant_lib").setLevel(logging.DEBUG)
|
|
```
|
|
|
|
## Summary
|
|
|
|
The AI Assistant Library provides Trax with:
|
|
|
|
✅ **Production-tested components** - Used across multiple projects
|
|
✅ **Consistent patterns** - Same patterns everywhere
|
|
✅ **Built-in resilience** - Retry, circuit breaker, caching
|
|
✅ **Type safety** - Full typing support
|
|
✅ **Performance optimization** - Connection pooling, batch operations
|
|
✅ **Comprehensive testing** - Test utilities and fixtures
|
|
|
|
By leveraging this library, Trax can focus on its unique media processing capabilities while relying on proven infrastructure components.
|
|
|
|
---
|
|
|
|
For more information about the library, see:
|
|
- [Library Source](../../lib/)
|
|
- [Library Tests](../../lib/tests/)
|
|
- [Usage Examples](../examples/) |