youtube-summarizer/sdks/python
enias 053e8fc63b feat: Enhanced Epic 4 with Multi-Agent System and RAG Chat
### Updated Epic 4 Documentation
- Enhanced Story 4.3: Multi-video Analysis with Multi-Agent System
  - Three perspective agents (Technical, Business, User)
  - Synthesis agent for unified summaries
  - Integration with existing AI ecosystem
  - Increased effort from 28 to 40 hours

- Enhanced Story 4.4: Custom Models & Enhanced Markdown Export
  - Executive summary generation (2-3 paragraphs)
  - Timestamped sections with [HH:MM:SS] format
  - Enhanced markdown structure with table of contents
  - Increased effort from 24 to 32 hours

- Enhanced Story 4.6: RAG-Powered Video Chat with ChromaDB
  - ChromaDB vector database integration
  - RAG implementation using existing test patterns
  - Chat interface with timestamp source references
  - DeepSeek integration for AI responses

### Epic Effort Updates
- Total Epic 4 effort: 126 → 146 hours
- Remaining work: 72 → 92 hours
- Implementation timeline extended to 4-5 weeks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-27 04:22:46 -04:00
..
youtube_summarizer_sdk feat: Enhanced Epic 4 with Multi-Agent System and RAG Chat 2025-08-27 04:22:46 -04:00
README.md feat: Enhanced Epic 4 with Multi-Agent System and RAG Chat 2025-08-27 04:22:46 -04:00
setup.py feat: Enhanced Epic 4 with Multi-Agent System and RAG Chat 2025-08-27 04:22:46 -04:00

README.md

YouTube Summarizer Python SDK

Official Python client library for the YouTube Summarizer Developer Platform. Extract transcripts, generate summaries, and integrate AI-powered video analysis into your applications.

PyPI version Python Support License: MIT

Features

  • Async/Await Support - Built for modern Python applications
  • Dual Transcript Sources - YouTube captions, Whisper AI, or both
  • Real-time Updates - WebSocket support for progress tracking
  • Batch Processing - Process multiple videos simultaneously
  • Quality Analysis - Transcript quality scoring and comparison
  • MCP Integration - Model Context Protocol support for AI development
  • Cost Estimation - Processing time and cost predictions
  • Export Options - JSON, CSV, Markdown, and PDF formats

Installation

pip install youtube-summarizer-sdk

Optional Dependencies

# For MCP (Model Context Protocol) support
pip install youtube-summarizer-sdk[mcp]

# For development
pip install youtube-summarizer-sdk[dev]

# Install all extras
pip install youtube-summarizer-sdk[all]

Quick Start

import asyncio
from youtube_summarizer_sdk import create_client, TranscriptRequest

async def main():
    # Initialize client with your API key
    client = create_client(api_key="ys_pro_your_api_key_here")
    
    async with client:
        # Extract transcript from YouTube video
        request = TranscriptRequest(
            video_url="https://youtube.com/watch?v=dQw4w9WgXcQ",
            transcript_source="youtube",
            include_quality_analysis=True
        )
        
        # Wait for completion (blocks until done)
        result = await client.extract_and_wait(request)
        
        print(f"Transcript: {result.transcript[:200]}...")
        print(f"Quality Score: {result.quality_score}")
        print(f"Processing Time: {result.processing_time_seconds}s")

asyncio.run(main())

Core Features

Transcript Extraction

from youtube_summarizer_sdk import TranscriptRequest, TranscriptSource

# YouTube captions
request = TranscriptRequest(
    video_url="https://youtube.com/watch?v=VIDEO_ID",
    transcript_source=TranscriptSource.YOUTUBE
)

# Whisper AI transcription  
request = TranscriptRequest(
    video_url="https://youtube.com/watch?v=VIDEO_ID",
    transcript_source=TranscriptSource.WHISPER,
    whisper_model_size="small"  # tiny, base, small, medium, large
)

# Both sources with comparison
request = TranscriptRequest(
    video_url="https://youtube.com/watch?v=VIDEO_ID", 
    transcript_source=TranscriptSource.BOTH,
    include_quality_analysis=True
)

# Submit and wait for result
result = await client.extract_and_wait(request, timeout=300)

Batch Processing

from youtube_summarizer_sdk import BatchProcessingRequest

# Process multiple videos
batch_request = BatchProcessingRequest(
    video_urls=[
        "https://youtube.com/watch?v=VIDEO1",
        "https://youtube.com/watch?v=VIDEO2",
        "https://youtube.com/watch?v=VIDEO3"
    ],
    batch_name="My Video Collection",
    transcript_source="youtube",
    parallel_processing=True,
    max_concurrent_jobs=3
)

batch_job = await client.batch_process(batch_request)
print(f"Batch ID: {batch_job.batch_id}")

Real-time Progress Tracking

# Connect WebSocket for real-time updates
await client.connect_websocket()

# Submit job
job = await client.extract_transcript(request)

# Listen for updates
async for update in client.listen_for_updates():
    if update.data.get("job_id") == job.job_id:
        print(f"Progress: {update.data.get('progress', 0)}%")
        
        if update.event == "job.completed":
            result = await client.get_job_result(job.job_id)
            break

Processing Estimates

# Get time and cost estimate
estimate = await client.get_processing_estimate(
    video_url="https://youtube.com/watch?v=VIDEO_ID",
    transcript_source="whisper"
)

print(f"Estimated time: {estimate.estimated_time_seconds}s")
print(f"Estimated cost: ${estimate.estimated_cost:.4f}")

Data Export

# Export data in various formats
export_data = await client.export_data(
    format="json",  # json, csv, markdown, pdf
    date_from="2024-01-01",
    date_to="2024-12-31"
)

print(export_data)

MCP (Model Context Protocol) Integration

The SDK includes MCP support for AI development environments like Claude Code:

from youtube_summarizer_sdk import create_mcp_interface

# Create MCP interface
mcp = create_mcp_interface(api_key="your_api_key")

# List available tools
tools = await mcp.list_tools()

# Execute MCP tool
from youtube_summarizer_sdk import MCPToolRequest
request = MCPToolRequest(
    name="extract_transcript",
    arguments={
        "video_url": "https://youtube.com/watch?v=VIDEO_ID",
        "transcript_source": "youtube",
        "wait_for_completion": True
    }
)

result = await mcp.call_tool(request)

Configuration

Client Configuration

from youtube_summarizer_sdk import SDKConfig, YouTubeSummarizerClient

config = SDKConfig(
    api_key="your_api_key",
    base_url="https://api.youtube-summarizer.com",
    timeout=60.0,
    max_retries=3,
    retry_delay=1.0,
    verify_ssl=True
)

client = YouTubeSummarizerClient(config)

WebSocket Configuration

from youtube_summarizer_sdk import WebSocketConfig

ws_config = WebSocketConfig(
    url="wss://api.youtube-summarizer.com/ws", 
    auto_reconnect=True,
    max_reconnect_attempts=5,
    heartbeat_interval=30.0
)

await client.connect_websocket(ws_config)

API Reference

Models

  • TranscriptRequest - Video transcript extraction request
  • BatchProcessingRequest - Batch video processing request
  • JobResponse - Job creation and status response
  • TranscriptResult - Single transcript extraction result
  • DualTranscriptResult - Dual transcript comparison result
  • APIUsageStats - Usage statistics and limits
  • ProcessingTimeEstimate - Time and cost estimates

Enums

  • TranscriptSource - youtube, whisper, both
  • WhisperModelSize - tiny, base, small, medium, large
  • ProcessingPriority - low, normal, high, urgent
  • JobStatus - queued, processing, completed, failed, cancelled

Main Client Methods

# Core API methods
await client.extract_transcript(request: TranscriptRequest) -> JobResponse
await client.batch_process(request: BatchProcessingRequest) -> BatchJobResponse
await client.get_job_status(job_id: str) -> JobResponse
await client.get_job_result(job_id: str) -> Union[TranscriptResult, DualTranscriptResult]
await client.cancel_job(job_id: str) -> Dict[str, Any]

# Utility methods
await client.get_processing_estimate(video_url: str) -> ProcessingTimeEstimate
await client.get_usage_stats() -> APIUsageStats
await client.search_summaries(query: str) -> Dict[str, Any]
await client.export_data(format: str = 'json') -> Dict[str, Any]

# Convenience methods
await client.extract_and_wait(request: TranscriptRequest, timeout: float = 300) -> Union[TranscriptResult, DualTranscriptResult]
await client.wait_for_job(job_id: str, timeout: float = 300) -> Union[TranscriptResult, DualTranscriptResult]

# WebSocket methods
await client.connect_websocket(config: Optional[WebSocketConfig] = None) -> bool
async for update in client.listen_for_updates(): # -> AsyncGenerator[WebhookPayload, None]
await client.disconnect_websocket()

Error Handling

from youtube_summarizer_sdk import (
    YouTubeSummarizerError, AuthenticationError, RateLimitError,
    ValidationError, APIError, JobTimeoutError
)

try:
    result = await client.extract_transcript(request)
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited. Remaining: {e.remaining}, Reset: {e.reset_time}")
except ValidationError as e:
    print(f"Validation failed: {e.validation_errors}")
except JobTimeoutError as e:
    print(f"Job {e.job_id} timed out after {e.timeout_seconds}s")
except YouTubeSummarizerError as e:
    print(f"SDK error: {e.message}")

Examples

Basic Usage

import asyncio
from youtube_summarizer_sdk import create_client, TranscriptRequest

async def extract_transcript():
    client = create_client(api_key="your_api_key")
    
    async with client:
        request = TranscriptRequest(
            video_url="https://youtube.com/watch?v=dQw4w9WgXcQ"
        )
        
        result = await client.extract_and_wait(request)
        print(f"Transcript: {result.transcript}")
        return result

asyncio.run(extract_transcript())

Dual Transcript Comparison

async def compare_transcripts():
    client = create_client(api_key="your_api_key")
    
    async with client:
        request = TranscriptRequest(
            video_url="https://youtube.com/watch?v=VIDEO_ID",
            transcript_source="both",  # Extract both YouTube and Whisper
            include_quality_analysis=True
        )
        
        result = await client.extract_and_wait(request)
        
        if hasattr(result, 'quality_comparison'):
            comparison = result.quality_comparison
            print(f"Similarity Score: {comparison.similarity_score}")
            print(f"Recommended Source: {comparison.recommendation}")
            print(f"YouTube Transcript: {result.youtube_transcript[:200]}...")
            print(f"Whisper Transcript: {result.whisper_transcript[:200]}...")

asyncio.run(compare_transcripts())

Batch Processing with Progress

async def batch_process_with_progress():
    client = create_client(api_key="your_api_key")
    
    async with client:
        # Connect WebSocket for real-time updates
        await client.connect_websocket()
        
        # Submit batch job
        batch_request = BatchProcessingRequest(
            video_urls=[
                "https://youtube.com/watch?v=VIDEO1",
                "https://youtube.com/watch?v=VIDEO2"
            ],
            batch_name="Tutorial Series",
            parallel_processing=True
        )
        
        batch_job = await client.batch_process(batch_request)
        
        # Listen for progress updates
        async for update in client.listen_for_updates():
            if update.event == "batch.completed":
                print("Batch processing completed!")
                break
            elif update.event == "job.progress":
                print(f"Progress: {update.data}")

asyncio.run(batch_process_with_progress())

Contributing

  1. Clone the repository
  2. Install development dependencies: pip install -e .[dev]
  3. Run tests: pytest
  4. Format code: black youtube_summarizer_sdk/
  5. Type check: mypy youtube_summarizer_sdk/

API Tiers & Rate Limits

Tier Requests/Minute Requests/Day Requests/Month
Free 10 1,000 10,000
Pro 100 25,000 500,000
Enterprise 1,000 100,000 2,000,000

Support

License

This SDK is licensed under the MIT License. See the LICENSE file for details.