youtube-summarizer/sdks/python/README.md

399 lines
12 KiB
Markdown

# YouTube Summarizer Python SDK
Official Python client library for the YouTube Summarizer Developer Platform. Extract transcripts, generate summaries, and integrate AI-powered video analysis into your applications.
[![PyPI version](https://badge.fury.io/py/youtube-summarizer-sdk.svg)](https://badge.fury.io/py/youtube-summarizer-sdk)
[![Python Support](https://img.shields.io/pypi/pyversions/youtube-summarizer-sdk.svg)](https://pypi.org/project/youtube-summarizer-sdk/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
## Features
- **Async/Await Support** - Built for modern Python applications
- **Dual Transcript Sources** - YouTube captions, Whisper AI, or both
- **Real-time Updates** - WebSocket support for progress tracking
- **Batch Processing** - Process multiple videos simultaneously
- **Quality Analysis** - Transcript quality scoring and comparison
- **MCP Integration** - Model Context Protocol support for AI development
- **Cost Estimation** - Processing time and cost predictions
- **Export Options** - JSON, CSV, Markdown, and PDF formats
## Installation
```bash
pip install youtube-summarizer-sdk
```
### Optional Dependencies
```bash
# For MCP (Model Context Protocol) support
pip install youtube-summarizer-sdk[mcp]
# For development
pip install youtube-summarizer-sdk[dev]
# Install all extras
pip install youtube-summarizer-sdk[all]
```
## Quick Start
```python
import asyncio
from youtube_summarizer_sdk import create_client, TranscriptRequest
async def main():
# Initialize client with your API key
client = create_client(api_key="ys_pro_your_api_key_here")
async with client:
# Extract transcript from YouTube video
request = TranscriptRequest(
video_url="https://youtube.com/watch?v=dQw4w9WgXcQ",
transcript_source="youtube",
include_quality_analysis=True
)
# Wait for completion (blocks until done)
result = await client.extract_and_wait(request)
print(f"Transcript: {result.transcript[:200]}...")
print(f"Quality Score: {result.quality_score}")
print(f"Processing Time: {result.processing_time_seconds}s")
asyncio.run(main())
```
## Core Features
### Transcript Extraction
```python
from youtube_summarizer_sdk import TranscriptRequest, TranscriptSource
# YouTube captions
request = TranscriptRequest(
video_url="https://youtube.com/watch?v=VIDEO_ID",
transcript_source=TranscriptSource.YOUTUBE
)
# Whisper AI transcription
request = TranscriptRequest(
video_url="https://youtube.com/watch?v=VIDEO_ID",
transcript_source=TranscriptSource.WHISPER,
whisper_model_size="small" # tiny, base, small, medium, large
)
# Both sources with comparison
request = TranscriptRequest(
video_url="https://youtube.com/watch?v=VIDEO_ID",
transcript_source=TranscriptSource.BOTH,
include_quality_analysis=True
)
# Submit and wait for result
result = await client.extract_and_wait(request, timeout=300)
```
### Batch Processing
```python
from youtube_summarizer_sdk import BatchProcessingRequest
# Process multiple videos
batch_request = BatchProcessingRequest(
video_urls=[
"https://youtube.com/watch?v=VIDEO1",
"https://youtube.com/watch?v=VIDEO2",
"https://youtube.com/watch?v=VIDEO3"
],
batch_name="My Video Collection",
transcript_source="youtube",
parallel_processing=True,
max_concurrent_jobs=3
)
batch_job = await client.batch_process(batch_request)
print(f"Batch ID: {batch_job.batch_id}")
```
### Real-time Progress Tracking
```python
# Connect WebSocket for real-time updates
await client.connect_websocket()
# Submit job
job = await client.extract_transcript(request)
# Listen for updates
async for update in client.listen_for_updates():
if update.data.get("job_id") == job.job_id:
print(f"Progress: {update.data.get('progress', 0)}%")
if update.event == "job.completed":
result = await client.get_job_result(job.job_id)
break
```
### Processing Estimates
```python
# Get time and cost estimate
estimate = await client.get_processing_estimate(
video_url="https://youtube.com/watch?v=VIDEO_ID",
transcript_source="whisper"
)
print(f"Estimated time: {estimate.estimated_time_seconds}s")
print(f"Estimated cost: ${estimate.estimated_cost:.4f}")
```
### Data Export
```python
# Export data in various formats
export_data = await client.export_data(
format="json", # json, csv, markdown, pdf
date_from="2024-01-01",
date_to="2024-12-31"
)
print(export_data)
```
## MCP (Model Context Protocol) Integration
The SDK includes MCP support for AI development environments like Claude Code:
```python
from youtube_summarizer_sdk import create_mcp_interface
# Create MCP interface
mcp = create_mcp_interface(api_key="your_api_key")
# List available tools
tools = await mcp.list_tools()
# Execute MCP tool
from youtube_summarizer_sdk import MCPToolRequest
request = MCPToolRequest(
name="extract_transcript",
arguments={
"video_url": "https://youtube.com/watch?v=VIDEO_ID",
"transcript_source": "youtube",
"wait_for_completion": True
}
)
result = await mcp.call_tool(request)
```
## Configuration
### Client Configuration
```python
from youtube_summarizer_sdk import SDKConfig, YouTubeSummarizerClient
config = SDKConfig(
api_key="your_api_key",
base_url="https://api.youtube-summarizer.com",
timeout=60.0,
max_retries=3,
retry_delay=1.0,
verify_ssl=True
)
client = YouTubeSummarizerClient(config)
```
### WebSocket Configuration
```python
from youtube_summarizer_sdk import WebSocketConfig
ws_config = WebSocketConfig(
url="wss://api.youtube-summarizer.com/ws",
auto_reconnect=True,
max_reconnect_attempts=5,
heartbeat_interval=30.0
)
await client.connect_websocket(ws_config)
```
## API Reference
### Models
- **TranscriptRequest** - Video transcript extraction request
- **BatchProcessingRequest** - Batch video processing request
- **JobResponse** - Job creation and status response
- **TranscriptResult** - Single transcript extraction result
- **DualTranscriptResult** - Dual transcript comparison result
- **APIUsageStats** - Usage statistics and limits
- **ProcessingTimeEstimate** - Time and cost estimates
### Enums
- **TranscriptSource** - `youtube`, `whisper`, `both`
- **WhisperModelSize** - `tiny`, `base`, `small`, `medium`, `large`
- **ProcessingPriority** - `low`, `normal`, `high`, `urgent`
- **JobStatus** - `queued`, `processing`, `completed`, `failed`, `cancelled`
### Main Client Methods
```python
# Core API methods
await client.extract_transcript(request: TranscriptRequest) -> JobResponse
await client.batch_process(request: BatchProcessingRequest) -> BatchJobResponse
await client.get_job_status(job_id: str) -> JobResponse
await client.get_job_result(job_id: str) -> Union[TranscriptResult, DualTranscriptResult]
await client.cancel_job(job_id: str) -> Dict[str, Any]
# Utility methods
await client.get_processing_estimate(video_url: str) -> ProcessingTimeEstimate
await client.get_usage_stats() -> APIUsageStats
await client.search_summaries(query: str) -> Dict[str, Any]
await client.export_data(format: str = 'json') -> Dict[str, Any]
# Convenience methods
await client.extract_and_wait(request: TranscriptRequest, timeout: float = 300) -> Union[TranscriptResult, DualTranscriptResult]
await client.wait_for_job(job_id: str, timeout: float = 300) -> Union[TranscriptResult, DualTranscriptResult]
# WebSocket methods
await client.connect_websocket(config: Optional[WebSocketConfig] = None) -> bool
async for update in client.listen_for_updates(): # -> AsyncGenerator[WebhookPayload, None]
await client.disconnect_websocket()
```
## Error Handling
```python
from youtube_summarizer_sdk import (
YouTubeSummarizerError, AuthenticationError, RateLimitError,
ValidationError, APIError, JobTimeoutError
)
try:
result = await client.extract_transcript(request)
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited. Remaining: {e.remaining}, Reset: {e.reset_time}")
except ValidationError as e:
print(f"Validation failed: {e.validation_errors}")
except JobTimeoutError as e:
print(f"Job {e.job_id} timed out after {e.timeout_seconds}s")
except YouTubeSummarizerError as e:
print(f"SDK error: {e.message}")
```
## Examples
### Basic Usage
```python
import asyncio
from youtube_summarizer_sdk import create_client, TranscriptRequest
async def extract_transcript():
client = create_client(api_key="your_api_key")
async with client:
request = TranscriptRequest(
video_url="https://youtube.com/watch?v=dQw4w9WgXcQ"
)
result = await client.extract_and_wait(request)
print(f"Transcript: {result.transcript}")
return result
asyncio.run(extract_transcript())
```
### Dual Transcript Comparison
```python
async def compare_transcripts():
client = create_client(api_key="your_api_key")
async with client:
request = TranscriptRequest(
video_url="https://youtube.com/watch?v=VIDEO_ID",
transcript_source="both", # Extract both YouTube and Whisper
include_quality_analysis=True
)
result = await client.extract_and_wait(request)
if hasattr(result, 'quality_comparison'):
comparison = result.quality_comparison
print(f"Similarity Score: {comparison.similarity_score}")
print(f"Recommended Source: {comparison.recommendation}")
print(f"YouTube Transcript: {result.youtube_transcript[:200]}...")
print(f"Whisper Transcript: {result.whisper_transcript[:200]}...")
asyncio.run(compare_transcripts())
```
### Batch Processing with Progress
```python
async def batch_process_with_progress():
client = create_client(api_key="your_api_key")
async with client:
# Connect WebSocket for real-time updates
await client.connect_websocket()
# Submit batch job
batch_request = BatchProcessingRequest(
video_urls=[
"https://youtube.com/watch?v=VIDEO1",
"https://youtube.com/watch?v=VIDEO2"
],
batch_name="Tutorial Series",
parallel_processing=True
)
batch_job = await client.batch_process(batch_request)
# Listen for progress updates
async for update in client.listen_for_updates():
if update.event == "batch.completed":
print("Batch processing completed!")
break
elif update.event == "job.progress":
print(f"Progress: {update.data}")
asyncio.run(batch_process_with_progress())
```
## Contributing
1. Clone the repository
2. Install development dependencies: `pip install -e .[dev]`
3. Run tests: `pytest`
4. Format code: `black youtube_summarizer_sdk/`
5. Type check: `mypy youtube_summarizer_sdk/`
## API Tiers & Rate Limits
| Tier | Requests/Minute | Requests/Day | Requests/Month |
|------|----------------|--------------|----------------|
| Free | 10 | 1,000 | 10,000 |
| Pro | 100 | 25,000 | 500,000 |
| Enterprise | 1,000 | 100,000 | 2,000,000 |
## Support
- **Documentation**: https://docs.youtube-summarizer.com/python-sdk
- **API Reference**: https://api.youtube-summarizer.com/docs
- **Issues**: https://github.com/youtube-summarizer/python-sdk/issues
- **Email**: support@youtube-summarizer.com
## License
This SDK is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.