# YouTube Summarizer Python SDK Official Python client library for the YouTube Summarizer Developer Platform. Extract transcripts, generate summaries, and integrate AI-powered video analysis into your applications. [![PyPI version](https://badge.fury.io/py/youtube-summarizer-sdk.svg)](https://badge.fury.io/py/youtube-summarizer-sdk) [![Python Support](https://img.shields.io/pypi/pyversions/youtube-summarizer-sdk.svg)](https://pypi.org/project/youtube-summarizer-sdk/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ## Features - **Async/Await Support** - Built for modern Python applications - **Dual Transcript Sources** - YouTube captions, Whisper AI, or both - **Real-time Updates** - WebSocket support for progress tracking - **Batch Processing** - Process multiple videos simultaneously - **Quality Analysis** - Transcript quality scoring and comparison - **MCP Integration** - Model Context Protocol support for AI development - **Cost Estimation** - Processing time and cost predictions - **Export Options** - JSON, CSV, Markdown, and PDF formats ## Installation ```bash pip install youtube-summarizer-sdk ``` ### Optional Dependencies ```bash # For MCP (Model Context Protocol) support pip install youtube-summarizer-sdk[mcp] # For development pip install youtube-summarizer-sdk[dev] # Install all extras pip install youtube-summarizer-sdk[all] ``` ## Quick Start ```python import asyncio from youtube_summarizer_sdk import create_client, TranscriptRequest async def main(): # Initialize client with your API key client = create_client(api_key="ys_pro_your_api_key_here") async with client: # Extract transcript from YouTube video request = TranscriptRequest( video_url="https://youtube.com/watch?v=dQw4w9WgXcQ", transcript_source="youtube", include_quality_analysis=True ) # Wait for completion (blocks until done) result = await client.extract_and_wait(request) print(f"Transcript: {result.transcript[:200]}...") print(f"Quality Score: {result.quality_score}") print(f"Processing Time: {result.processing_time_seconds}s") asyncio.run(main()) ``` ## Core Features ### Transcript Extraction ```python from youtube_summarizer_sdk import TranscriptRequest, TranscriptSource # YouTube captions request = TranscriptRequest( video_url="https://youtube.com/watch?v=VIDEO_ID", transcript_source=TranscriptSource.YOUTUBE ) # Whisper AI transcription request = TranscriptRequest( video_url="https://youtube.com/watch?v=VIDEO_ID", transcript_source=TranscriptSource.WHISPER, whisper_model_size="small" # tiny, base, small, medium, large ) # Both sources with comparison request = TranscriptRequest( video_url="https://youtube.com/watch?v=VIDEO_ID", transcript_source=TranscriptSource.BOTH, include_quality_analysis=True ) # Submit and wait for result result = await client.extract_and_wait(request, timeout=300) ``` ### Batch Processing ```python from youtube_summarizer_sdk import BatchProcessingRequest # Process multiple videos batch_request = BatchProcessingRequest( video_urls=[ "https://youtube.com/watch?v=VIDEO1", "https://youtube.com/watch?v=VIDEO2", "https://youtube.com/watch?v=VIDEO3" ], batch_name="My Video Collection", transcript_source="youtube", parallel_processing=True, max_concurrent_jobs=3 ) batch_job = await client.batch_process(batch_request) print(f"Batch ID: {batch_job.batch_id}") ``` ### Real-time Progress Tracking ```python # Connect WebSocket for real-time updates await client.connect_websocket() # Submit job job = await client.extract_transcript(request) # Listen for updates async for update in client.listen_for_updates(): if update.data.get("job_id") == job.job_id: print(f"Progress: {update.data.get('progress', 0)}%") if update.event == "job.completed": result = await client.get_job_result(job.job_id) break ``` ### Processing Estimates ```python # Get time and cost estimate estimate = await client.get_processing_estimate( video_url="https://youtube.com/watch?v=VIDEO_ID", transcript_source="whisper" ) print(f"Estimated time: {estimate.estimated_time_seconds}s") print(f"Estimated cost: ${estimate.estimated_cost:.4f}") ``` ### Data Export ```python # Export data in various formats export_data = await client.export_data( format="json", # json, csv, markdown, pdf date_from="2024-01-01", date_to="2024-12-31" ) print(export_data) ``` ## MCP (Model Context Protocol) Integration The SDK includes MCP support for AI development environments like Claude Code: ```python from youtube_summarizer_sdk import create_mcp_interface # Create MCP interface mcp = create_mcp_interface(api_key="your_api_key") # List available tools tools = await mcp.list_tools() # Execute MCP tool from youtube_summarizer_sdk import MCPToolRequest request = MCPToolRequest( name="extract_transcript", arguments={ "video_url": "https://youtube.com/watch?v=VIDEO_ID", "transcript_source": "youtube", "wait_for_completion": True } ) result = await mcp.call_tool(request) ``` ## Configuration ### Client Configuration ```python from youtube_summarizer_sdk import SDKConfig, YouTubeSummarizerClient config = SDKConfig( api_key="your_api_key", base_url="https://api.youtube-summarizer.com", timeout=60.0, max_retries=3, retry_delay=1.0, verify_ssl=True ) client = YouTubeSummarizerClient(config) ``` ### WebSocket Configuration ```python from youtube_summarizer_sdk import WebSocketConfig ws_config = WebSocketConfig( url="wss://api.youtube-summarizer.com/ws", auto_reconnect=True, max_reconnect_attempts=5, heartbeat_interval=30.0 ) await client.connect_websocket(ws_config) ``` ## API Reference ### Models - **TranscriptRequest** - Video transcript extraction request - **BatchProcessingRequest** - Batch video processing request - **JobResponse** - Job creation and status response - **TranscriptResult** - Single transcript extraction result - **DualTranscriptResult** - Dual transcript comparison result - **APIUsageStats** - Usage statistics and limits - **ProcessingTimeEstimate** - Time and cost estimates ### Enums - **TranscriptSource** - `youtube`, `whisper`, `both` - **WhisperModelSize** - `tiny`, `base`, `small`, `medium`, `large` - **ProcessingPriority** - `low`, `normal`, `high`, `urgent` - **JobStatus** - `queued`, `processing`, `completed`, `failed`, `cancelled` ### Main Client Methods ```python # Core API methods await client.extract_transcript(request: TranscriptRequest) -> JobResponse await client.batch_process(request: BatchProcessingRequest) -> BatchJobResponse await client.get_job_status(job_id: str) -> JobResponse await client.get_job_result(job_id: str) -> Union[TranscriptResult, DualTranscriptResult] await client.cancel_job(job_id: str) -> Dict[str, Any] # Utility methods await client.get_processing_estimate(video_url: str) -> ProcessingTimeEstimate await client.get_usage_stats() -> APIUsageStats await client.search_summaries(query: str) -> Dict[str, Any] await client.export_data(format: str = 'json') -> Dict[str, Any] # Convenience methods await client.extract_and_wait(request: TranscriptRequest, timeout: float = 300) -> Union[TranscriptResult, DualTranscriptResult] await client.wait_for_job(job_id: str, timeout: float = 300) -> Union[TranscriptResult, DualTranscriptResult] # WebSocket methods await client.connect_websocket(config: Optional[WebSocketConfig] = None) -> bool async for update in client.listen_for_updates(): # -> AsyncGenerator[WebhookPayload, None] await client.disconnect_websocket() ``` ## Error Handling ```python from youtube_summarizer_sdk import ( YouTubeSummarizerError, AuthenticationError, RateLimitError, ValidationError, APIError, JobTimeoutError ) try: result = await client.extract_transcript(request) except AuthenticationError: print("Invalid API key") except RateLimitError as e: print(f"Rate limited. Remaining: {e.remaining}, Reset: {e.reset_time}") except ValidationError as e: print(f"Validation failed: {e.validation_errors}") except JobTimeoutError as e: print(f"Job {e.job_id} timed out after {e.timeout_seconds}s") except YouTubeSummarizerError as e: print(f"SDK error: {e.message}") ``` ## Examples ### Basic Usage ```python import asyncio from youtube_summarizer_sdk import create_client, TranscriptRequest async def extract_transcript(): client = create_client(api_key="your_api_key") async with client: request = TranscriptRequest( video_url="https://youtube.com/watch?v=dQw4w9WgXcQ" ) result = await client.extract_and_wait(request) print(f"Transcript: {result.transcript}") return result asyncio.run(extract_transcript()) ``` ### Dual Transcript Comparison ```python async def compare_transcripts(): client = create_client(api_key="your_api_key") async with client: request = TranscriptRequest( video_url="https://youtube.com/watch?v=VIDEO_ID", transcript_source="both", # Extract both YouTube and Whisper include_quality_analysis=True ) result = await client.extract_and_wait(request) if hasattr(result, 'quality_comparison'): comparison = result.quality_comparison print(f"Similarity Score: {comparison.similarity_score}") print(f"Recommended Source: {comparison.recommendation}") print(f"YouTube Transcript: {result.youtube_transcript[:200]}...") print(f"Whisper Transcript: {result.whisper_transcript[:200]}...") asyncio.run(compare_transcripts()) ``` ### Batch Processing with Progress ```python async def batch_process_with_progress(): client = create_client(api_key="your_api_key") async with client: # Connect WebSocket for real-time updates await client.connect_websocket() # Submit batch job batch_request = BatchProcessingRequest( video_urls=[ "https://youtube.com/watch?v=VIDEO1", "https://youtube.com/watch?v=VIDEO2" ], batch_name="Tutorial Series", parallel_processing=True ) batch_job = await client.batch_process(batch_request) # Listen for progress updates async for update in client.listen_for_updates(): if update.event == "batch.completed": print("Batch processing completed!") break elif update.event == "job.progress": print(f"Progress: {update.data}") asyncio.run(batch_process_with_progress()) ``` ## Contributing 1. Clone the repository 2. Install development dependencies: `pip install -e .[dev]` 3. Run tests: `pytest` 4. Format code: `black youtube_summarizer_sdk/` 5. Type check: `mypy youtube_summarizer_sdk/` ## API Tiers & Rate Limits | Tier | Requests/Minute | Requests/Day | Requests/Month | |------|----------------|--------------|----------------| | Free | 10 | 1,000 | 10,000 | | Pro | 100 | 25,000 | 500,000 | | Enterprise | 1,000 | 100,000 | 2,000,000 | ## Support - **Documentation**: https://docs.youtube-summarizer.com/python-sdk - **API Reference**: https://api.youtube-summarizer.com/docs - **Issues**: https://github.com/youtube-summarizer/python-sdk/issues - **Email**: support@youtube-summarizer.com ## License This SDK is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.