youtube-summarizer/.taskmaster/tasks/tasks.json

{
  "master": {
    "tasks": [
      {
        "id": 1,
        "title": "Setup Project Structure and Environment",
        "description": "Initialize the project repository with FastAPI backend and frontend structure, including environment setup, dependency management, and basic configuration.",
        "details": "1. Create project repository with appropriate .gitignore\n2. Set up FastAPI backend structure:\n   - main.py for application entry point\n   - app/ directory for application code\n   - routers/ for API endpoints\n   - models/ for data models\n   - services/ for business logic\n   - utils/ for helper functions\n3. Configure development environment:\n   - requirements.txt or pyproject.toml for Python dependencies\n   - Include FastAPI, uvicorn, youtube-transcript-api, and necessary AI SDK packages\n4. Set up basic frontend structure:\n   - HTML/CSS/JS or a modern framework\n   - Static assets directory\n   - Templates directory if using server-side rendering\n5. Configure environment variables for development\n6. Implement basic logging configuration\n7. Create README.md with setup instructions",
        "testStrategy": "1. Verify all dependencies install correctly\n2. Ensure development server starts without errors\n3. Confirm project structure follows best practices\n4. Test logging functionality\n5. Validate environment variable loading",
        "priority": "high",
        "dependencies": [],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 2,
        "title": "Implement YouTube Transcript Extraction",
        "description": "Create a service to extract transcripts from YouTube videos using the YouTube Transcript API with fallback mechanisms.",
        "details": "1. Implement a transcript extraction service:\n```python\nfrom youtube_transcript_api import YouTubeTranscriptApi\n\nclass TranscriptService:\n    def extract_transcript(self, video_id):\n        try:\n            transcript_list = YouTubeTranscriptApi.get_transcript(video_id)\n            return self._format_transcript(transcript_list)\n        except Exception as e:\n            # Log error and try fallback method\n            return self._fallback_extraction(video_id)\n            \n    def _format_transcript(self, transcript_list):\n        # Convert transcript list to formatted text\n        return '\\n'.join([item['text'] for item in transcript_list])\n        \n    def _fallback_extraction(self, video_id):\n        # Implement fallback using YouTube Data API\n        # This would require Google API credentials\n        pass\n        \n    def extract_video_id(self, url):\n        # Extract video ID from various YouTube URL formats\n        # Handle youtu.be, youtube.com/watch, youtube.com/v/, etc.\n        pass\n```\n2. Implement language detection and handling\n3. Add support for different YouTube URL formats\n4. Implement error handling for unavailable transcripts\n5. Add transcript caching to reduce API calls",
        "testStrategy": "1. Unit tests with mock responses for different YouTube URL formats\n2. Integration tests with actual YouTube videos (short test videos)\n3. Test error handling with invalid or unavailable videos\n4. Test language detection with multi-language videos\n5. Test fallback mechanism when primary extraction fails",
        "priority": "high",
        "dependencies": [
          1
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 3,
        "title": "Develop AI Summary Generation Service",
        "description": "Create a service to generate summaries from video transcripts using AI models, starting with a single model implementation.",
        "details": "1. Implement a summary generation service:\n```python\nimport openai\n\nclass SummaryService:\n    def __init__(self, api_key):\n        self.api_key = api_key\n        openai.api_key = api_key\n        \n    async def generate_summary(self, transcript, length='medium'):\n        # Define prompt based on desired summary length\n        prompt = self._create_prompt(transcript, length)\n        \n        try:\n            response = await openai.ChatCompletion.acreate(\n                model=\"gpt-3.5-turbo\",\n                messages=[\n                    {\"role\": \"system\", \"content\": \"You are a helpful assistant that summarizes YouTube video transcripts.\"},\n                    {\"role\": \"user\", \"content\": prompt}\n                ],\n                max_tokens=1000,\n                temperature=0.5\n            )\n            return response.choices[0].message.content\n        except Exception as e:\n            # Log error and handle gracefully\n            raise\n            \n    def _create_prompt(self, transcript, length):\n        # Create appropriate prompt based on length\n        token_limit = 4000  # Adjust based on model\n        \n        # Truncate transcript if needed\n        if len(transcript) > token_limit:\n            transcript = transcript[:token_limit]\n            \n        if length == 'short':\n            return f\"Provide a concise summary of the following transcript in 3-5 bullet points:\\n\\n{transcript}\"\n        elif length == 'medium':\n            return f\"Summarize the following transcript with key points and main ideas:\\n\\n{transcript}\"\n        else:  # long\n            return f\"Provide a detailed summary of the following transcript with main ideas, key points, and important details:\\n\\n{transcript}\"\n```\n2. Implement token usage optimization\n3. Add error handling for API failures\n4. Implement context window management for long transcripts\n5. Add summary formatting for better readability",
        "testStrategy": "1. Unit tests with mock AI responses\n2. Integration tests with actual API calls (using short test transcripts)\n3. Test different summary lengths\n4. Test error handling and recovery\n5. Test with transcripts of varying lengths to verify context window management\n6. Measure token usage and optimization effectiveness",
        "priority": "high",
        "dependencies": [
          1
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 4,
        "title": "Create Basic Frontend Interface",
        "description": "Develop a responsive web interface for URL input, summary display, and basic user interactions.",
        "details": "1. Create HTML structure for the application:\n   - Input form for YouTube URL\n   - Loading indicator\n   - Summary display area\n   - Error message section\n   - Copy-to-clipboard button\n2. Implement CSS for responsive design:\n   - Mobile-friendly layout\n   - Clean, intuitive design\n   - Loading animations\n   - Accessible color scheme\n3. Implement JavaScript functionality:\n```javascript\ndocument.addEventListener('DOMContentLoaded', () => {\n  const form = document.getElementById('summary-form');\n  const urlInput = document.getElementById('video-url');\n  const submitButton = document.getElementById('submit-button');\n  const summaryContainer = document.getElementById('summary-container');\n  const loadingIndicator = document.getElementById('loading-indicator');\n  const errorContainer = document.getElementById('error-container');\n  const copyButton = document.getElementById('copy-button');\n  \n  form.addEventListener('submit', async (e) => {\n    e.preventDefault();\n    const videoUrl = urlInput.value.trim();\n    \n    if (!videoUrl) {\n      showError('Please enter a YouTube URL');\n      return;\n    }\n    \n    try {\n      showLoading(true);\n      const response = await fetch('/api/summarize', {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({ url: videoUrl }),\n      });\n      \n      if (!response.ok) {\n        const errorData = await response.json();\n        throw new Error(errorData.detail || 'Failed to generate summary');\n      }\n      \n      const data = await response.json();\n      displaySummary(data.summary);\n    } catch (error) {\n      showError(error.message);\n    } finally {\n      showLoading(false);\n    }\n  });\n  \n  copyButton.addEventListener('click', () => {\n    const summaryText = summaryContainer.textContent;\n    navigator.clipboard.writeText(summaryText)\n      .then(() => {\n        copyButton.textContent = 'Copied!';\n        setTimeout(() => {\n          copyButton.textContent = 'Copy to Clipboard';\n        }, 2000);\n      })\n      .catch(err => {\n        showError('Failed to copy: ' + err);\n      });\n  });\n  \n  function showLoading(isLoading) {\n    loadingIndicator.style.display = isLoading ? 'block' : 'none';\n    submitButton.disabled = isLoading;\n  }\n  \n  function showError(message) {\n    errorContainer.textContent = message;\n    errorContainer.style.display = 'block';\n    setTimeout(() => {\n      errorContainer.style.display = 'none';\n    }, 5000);\n  }\n  \n  function displaySummary(summary) {\n    summaryContainer.innerHTML = '';\n    \n    // Format and display the summary\n    const formattedSummary = summary.replace(/\\n/g, '<br>');\n    summaryContainer.innerHTML = formattedSummary;\n    \n    // Show copy button\n    copyButton.style.display = 'block';\n  }\n});\n```\n4. Implement dark/light theme support\n5. Ensure accessibility compliance (WCAG 2.1)",
        "testStrategy": "1. Test responsive design across different screen sizes\n2. Verify form validation for URL input\n3. Test loading indicators and error messages\n4. Verify copy-to-clipboard functionality\n5. Test accessibility using screen readers and keyboard navigation\n6. Test dark/light theme switching\n7. Cross-browser testing (Chrome, Firefox, Safari, Edge)",
        "priority": "high",
        "dependencies": [
          1
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 5,
        "title": "Implement FastAPI Backend Endpoints",
        "description": "Develop the API endpoints for the application, including URL validation, summary generation, and error handling.",
        "details": "1. Create main FastAPI application:\n```python\nfrom fastapi import FastAPI, HTTPException, Depends\nfrom pydantic import BaseModel, validator\nimport re\nfrom app.services.transcript_service import TranscriptService\nfrom app.services.summary_service import SummaryService\n\napp = FastAPI(title=\"YouTube Summarizer API\")\n\nclass VideoRequest(BaseModel):\n    url: str\n    \n    @validator('url')\n    def validate_youtube_url(cls, v):\n        youtube_regex = r'^(https?://)?(www\\.)?(youtube\\.com|youtu\\.?be)/.+$'\n        if not re.match(youtube_regex, v):\n            raise ValueError('Invalid YouTube URL')\n        return v\n\nclass SummaryResponse(BaseModel):\n    summary: str\n    video_id: str\n    title: str = None\n\n@app.post(\"/api/summarize\", response_model=SummaryResponse)\nasync def summarize_video(request: VideoRequest):\n    transcript_service = TranscriptService()\n    summary_service = SummaryService(api_key=\"YOUR_API_KEY\")  # Use environment variable in production\n    \n    try:\n        # Extract video ID\n        video_id = transcript_service.extract_video_id(request.url)\n        \n        # Get transcript\n        transcript = transcript_service.extract_transcript(video_id)\n        \n        if not transcript:\n            raise HTTPException(status_code=404, detail=\"Could not extract transcript from this video\")\n        \n        # Generate summary\n        summary = await summary_service.generate_summary(transcript)\n        \n        return SummaryResponse(\n            summary=summary,\n            video_id=video_id\n        )\n    except Exception as e:\n        # Log the error\n        raise HTTPException(status_code=500, detail=str(e))\n\n# Health check endpoint\n@app.get(\"/api/health\")\ndef health_check():\n    return {\"status\": \"ok\"}\n```\n2. Implement CORS middleware\n3. Add request validation\n4. Implement proper error handling and status codes\n5. Add logging for API requests\n6. Implement rate limiting",
        "testStrategy": "1. Unit tests for endpoint validation\n2. Integration tests for the complete API flow\n3. Test error handling with various error scenarios\n4. Test rate limiting functionality\n5. Load testing to verify performance under concurrent requests\n6. Test CORS configuration with different origins",
        "priority": "high",
        "dependencies": [
          1,
          2,
          3
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 6,
        "title": "Implement Database and Caching System",
        "description": "Set up a database for storing summaries and implement a caching system to reduce API calls and improve performance.",
        "details": "1. Set up SQLite for development (PostgreSQL for production):\n```python\nfrom sqlalchemy import create_engine, Column, Integer, String, Text, DateTime\nfrom sqlalchemy.ext.declarative import declarative_base\nfrom sqlalchemy.orm import sessionmaker\nimport datetime\n\nDATABASE_URL = \"sqlite:///./youtube_summarizer.db\"  # Use PostgreSQL URL in production\nengine = create_engine(DATABASE_URL)\nSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)\nBase = declarative_base()\n\nclass Summary(Base):\n    __tablename__ = \"summaries\"\n    \n    id = Column(Integer, primary_key=True, index=True)\n    video_id = Column(String, unique=True, index=True)\n    transcript = Column(Text)\n    summary = Column(Text)\n    created_at = Column(DateTime, default=datetime.datetime.utcnow)\n    updated_at = Column(DateTime, default=datetime.datetime.utcnow, onupdate=datetime.datetime.utcnow)\n    model_used = Column(String)\n    \nBase.metadata.create_all(bind=engine)\n```\n2. Implement caching service:\n```python\nclass CacheService:\n    def __init__(self):\n        self.db = SessionLocal()\n    \n    def get_cached_summary(self, video_id, max_age_hours=24):\n        \"\"\"Get cached summary if available and not expired\"\"\"\n        summary = self.db.query(Summary).filter(Summary.video_id == video_id).first()\n        \n        if not summary:\n            return None\n            \n        # Check if cache is expired\n        max_age = datetime.timedelta(hours=max_age_hours)\n        if datetime.datetime.utcnow() - summary.updated_at > max_age:\n            return None\n            \n        return summary.summary\n    \n    def cache_summary(self, video_id, transcript, summary, model_used):\n        \"\"\"Cache a new summary or update existing one\"\"\"\n        existing = self.db.query(Summary).filter(Summary.video_id == video_id).first()\n        \n        if existing:\n            existing.transcript = transcript\n            existing.summary = summary\n            existing.model_used = model_used\n            existing.updated_at = datetime.datetime.utcnow()\n        else:\n            new_summary = Summary(\n                video_id=video_id,\n                transcript=transcript,\n                summary=summary,\n                model_used=model_used\n            )\n            self.db.add(new_summary)\n            \n        self.db.commit()\n        \n    def close(self):\n        self.db.close()\n```\n3. Integrate caching with API endpoints\n4. Implement database migrations\n5. Add cache invalidation strategy\n6. Implement efficient query patterns",
        "testStrategy": "1. Unit tests for database models\n2. Integration tests for caching service\n3. Test cache hit/miss scenarios\n4. Test cache expiration\n5. Performance testing to measure cache effectiveness\n6. Test database migrations\n7. Test with concurrent requests to verify thread safety",
        "priority": "medium",
        "dependencies": [
          1,
          5
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 7,
        "title": "Implement Multiple AI Model Support",
        "description": "Extend the summary service to support multiple AI models (OpenAI, Anthropic, DeepSeek) with model selection and fallback mechanisms.",
        "details": "1. Create a base AI model interface:\n```python\nfrom abc import ABC, abstractmethod\n\nclass AIModelInterface(ABC):\n    @abstractmethod\n    async def generate_summary(self, transcript, length):\n        pass\n        \n    @abstractmethod\n    def get_model_name(self):\n        pass\n        \n    @abstractmethod\n    def get_token_limit(self):\n        pass\n```\n\n2. Implement concrete model classes:\n```python\nclass OpenAIModel(AIModelInterface):\n    def __init__(self, api_key, model=\"gpt-3.5-turbo\"):\n        import openai\n        self.api_key = api_key\n        openai.api_key = api_key\n        self.model = model\n        \n    async def generate_summary(self, transcript, length='medium'):\n        import openai\n        prompt = self._create_prompt(transcript, length)\n        \n        try:\n            response = await openai.ChatCompletion.acreate(\n                model=self.model,\n                messages=[\n                    {\"role\": \"system\", \"content\": \"You are a helpful assistant that summarizes YouTube video transcripts.\"},\n                    {\"role\": \"user\", \"content\": prompt}\n                ],\n                max_tokens=1000,\n                temperature=0.5\n            )\n            return response.choices[0].message.content\n        except Exception as e:\n            # Log error\n            raise\n            \n    def get_model_name(self):\n        return f\"OpenAI {self.model}\"\n        \n    def get_token_limit(self):\n        if \"gpt-4\" in self.model:\n            return 8000\n        return 4000\n        \n    def _create_prompt(self, transcript, length):\n        # Similar to previous implementation\n        pass\n\nclass AnthropicModel(AIModelInterface):\n    def __init__(self, api_key, model=\"claude-2\"):\n        import anthropic\n        self.client = anthropic.Anthropic(api_key=api_key)\n        self.model = model\n        \n    async def generate_summary(self, transcript, length='medium'):\n        import anthropic\n        prompt = self._create_prompt(transcript, length)\n        \n        try:\n            response = await self.client.completions.create(\n                model=self.model,\n                prompt=f\"{anthropic.HUMAN_PROMPT} {prompt} {anthropic.AI_PROMPT}\",\n                max_tokens_to_sample=1000,\n                temperature=0.5\n            )\n            return response.completion\n        except Exception as e:\n            # Log error\n            raise\n            \n    # Implement other required methods\n\nclass DeepSeekModel(AIModelInterface):\n    # Similar implementation for DeepSeek\n    pass\n```\n\n3. Create a model factory and manager:\n```python\nclass AIModelFactory:\n    @staticmethod\n    def get_model(model_name, config):\n        if model_name == \"openai\":\n            return OpenAIModel(config[\"api_key\"], config.get(\"model\", \"gpt-3.5-turbo\"))\n        elif model_name == \"anthropic\":\n            return AnthropicModel(config[\"api_key\"], config.get(\"model\", \"claude-2\"))\n        elif model_name == \"deepseek\":\n            return DeepSeekModel(config[\"api_key\"], config.get(\"model\", \"deepseek-chat\"))\n        else:\n            raise ValueError(f\"Unsupported model: {model_name}\")\n\nclass AIModelManager:\n    def __init__(self, config):\n        self.config = config\n        self.models = {}\n        self.fallback_order = config.get(\"fallback_order\", [\"openai\", \"anthropic\", \"deepseek\"])\n        \n        # Initialize requested models\n        for model_name in self.fallback_order:\n            if model_name in config[\"models\"]:\n                self.models[model_name] = AIModelFactory.get_model(model_name, config[\"models\"][model_name])\n                \n    async def generate_summary(self, transcript, model_preference=None, length='medium'):\n        # Try preferred model first if specified\n        if model_preference and model_preference in self.models:\n            try:\n                return await self.models[model_preference].generate_summary(transcript, length)\n            except Exception as e:\n                # Log error and continue to fallbacks\n                pass\n                \n        # Try models in fallback order\n        for model_name in self.fallback_order:\n            if model_name in self.models:\n                try:\n                    return await self.models[model_name].generate_summary(transcript, length)\n                except Exception as e:\n                    # Log error and try next model\n                    continue\n                    \n        # If all models fail\n        raise Exception(\"All AI models failed to generate summary\")\n```\n\n4. Update API endpoints to support model selection\n5. Implement token usage optimization for each model\n6. Add configuration for API keys and model preferences",
        "testStrategy": "1. Unit tests for each model implementation\n2. Integration tests with actual API calls (using test API keys)\n3. Test fallback mechanisms by simulating failures\n4. Test model selection through API\n5. Test token limit handling for different models\n6. Benchmark performance and quality across different models\n7. Test error handling and recovery",
        "priority": "medium",
        "dependencies": [
          3,
          5
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 8,
        "title": "Implement Summary Customization Options",
        "description": "Add features for users to customize summary length, style, focus, and generate chapter timestamps.",
        "details": "1. Extend the API request model to include customization options:\n```python\nclass VideoRequest(BaseModel):\n    url: str\n    model: str = \"openai\"  # Default model\n    length: str = \"medium\"  # Options: short, medium, long\n    style: str = \"standard\"  # Options: standard, bullet, detailed\n    focus: str = \"general\"  # Options: general, technical, educational\n    generate_timestamps: bool = False\n    \n    @validator('url')\n    def validate_youtube_url(cls, v):\n        # Validation as before\n        pass\n        \n    @validator('length')\n    def validate_length(cls, v):\n        if v not in [\"short\", \"medium\", \"long\"]:\n            raise ValueError('Length must be one of: short, medium, long')\n        return v\n        \n    @validator('style')\n    def validate_style(cls, v):\n        if v not in [\"standard\", \"bullet\", \"detailed\"]:\n            raise ValueError('Style must be one of: standard, bullet, detailed')\n        return v\n        \n    @validator('focus')\n    def validate_focus(cls, v):\n        if v not in [\"general\", \"technical\", \"educational\"]:\n            raise ValueError('Focus must be one of: general, technical, educational')\n        return v\n```\n\n2. Enhance prompt creation to incorporate customization:\n```python\ndef _create_prompt(self, transcript, length, style, focus):\n    # Base prompt with transcript\n    base_prompt = f\"Here is a transcript from a YouTube video:\\n\\n{transcript}\\n\\n\"\n    \n    # Length customization\n    if length == 'short':\n        length_prompt = \"Provide a very concise summary in 3-5 bullet points covering only the most important information.\"\n    elif length == 'medium':\n        length_prompt = \"Provide a balanced summary with key points and main ideas, moderate level of detail.\"\n    else:  # long\n        length_prompt = \"Provide a comprehensive summary with main ideas, key points, supporting details, and examples.\"\n    \n    # Style customization\n    if style == 'bullet':\n        style_prompt = \"Format the summary as bullet points for easy scanning.\"\n    elif style == 'detailed':\n        style_prompt = \"Format the summary as paragraphs with section headings for different topics.\"\n    else:  # standard\n        style_prompt = \"Format the summary in a clear, readable way with a mix of paragraphs and bullet points as appropriate.\"\n    \n    # Focus customization\n    if focus == 'technical':\n        focus_prompt = \"Focus on technical details, specifications, and processes mentioned in the video.\"\n    elif focus == 'educational':\n        focus_prompt = \"Focus on educational content, learning points, and key takeaways for students.\"\n    else:  # general\n        focus_prompt = \"Provide a general overview that would be useful to most viewers.\"\n    \n    return base_prompt + length_prompt + \" \" + style_prompt + \" \" + focus_prompt\n```\n\n3. Implement timestamp generation:\n```python\ndef generate_timestamps(self, transcript_items):\n    \"\"\"Generate chapter timestamps from transcript items\"\"\"\n    # transcript_items should be the raw transcript with timestamps\n    \n    # Group transcript by potential chapter breaks (long pauses, topic changes)\n    chapters = []\n    current_chapter = {\"start\": transcript_items[0][\"start\"], \"text\": []}\n    \n    for i, item in enumerate(transcript_items):\n        current_chapter[\"text\"].append(item[\"text\"])\n        \n        # Check for chapter break conditions\n        if i < len(transcript_items) - 1:\n            next_item = transcript_items[i+1]\n            time_gap = next_item[\"start\"] - (item[\"start\"] + item[\"duration\"])\n            \n            # If there's a significant pause or enough text accumulated\n            if time_gap > 3.0 or len(\" \".join(current_chapter[\"text\"])) > 500:\n                # Finalize current chapter\n                current_chapter[\"text\"] = \" \".join(current_chapter[\"text\"])\n                chapters.append(current_chapter)\n                \n                # Start new chapter\n                current_chapter = {\"start\": next_item[\"start\"], \"text\": []}\n    \n    # Add the last chapter\n    if current_chapter[\"text\"]:\n        current_chapter[\"text\"] = \" \".join(current_chapter[\"text\"])\n        chapters.append(current_chapter)\n    \n    # Generate titles for chapters using AI\n    return self._generate_chapter_titles(chapters)\n\nasync def _generate_chapter_titles(self, chapters):\n    \"\"\"Use AI to generate meaningful titles for each chapter\"\"\"\n    # Implementation depends on the AI model being used\n    # This would call the AI model to generate a title for each chapter's text\n    pass\n```\n\n4. Update the frontend to include customization options\n5. Add preview functionality for different summary styles\n6. Implement caching that considers customization parameters",
        "testStrategy": "1. Unit tests for validation of customization options\n2. Integration tests for different combinations of options\n3. Test timestamp generation with various video types\n4. Test UI elements for customization\n5. User testing to evaluate the usefulness of different customization options\n6. Test caching with different customization parameters\n7. Verify prompt generation for different combinations of options",
        "priority": "medium",
        "dependencies": [
          3,
          5,
          7
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 9,
        "title": "Implement Export Functionality",
        "description": "Add features to export summaries in multiple formats (Markdown, PDF, TXT) and implement copy-to-clipboard functionality.",
        "details": "1. Create export service:\n```python\nclass ExportService:\n    def to_markdown(self, summary_data):\n        \"\"\"Convert summary to Markdown format\"\"\"\n        md = f\"# Summary of: {summary_data['title']}\\n\\n\"\n        \n        if 'video_id' in summary_data:\n            md += f\"[Watch Video](https://www.youtube.com/watch?v={summary_data['video_id']})\\n\\n\"\n            \n        md += f\"## Summary\\n\\n{summary_data['summary']}\\n\\n\"\n        \n        if 'timestamps' in summary_data and summary_data['timestamps']:\n            md += \"## Chapters\\n\\n\"\n            for chapter in summary_data['timestamps']:\n                time_str = self._format_time(chapter['start'])\n                md += f\"- [{time_str}]({self._create_timestamp_url(summary_data['video_id'], chapter['start'])}) {chapter['title']}\\n\"\n                \n        return md\n    \n    def to_txt(self, summary_data):\n        \"\"\"Convert summary to plain text format\"\"\"\n        txt = f\"Summary of: {summary_data['title']}\\n\\n\"\n        txt += f\"Video URL: https://www.youtube.com/watch?v={summary_data['video_id']}\\n\\n\"\n        txt += f\"SUMMARY:\\n\\n{summary_data['summary']}\\n\\n\"\n        \n        if 'timestamps' in summary_data and summary_data['timestamps']:\n            txt += \"CHAPTERS:\\n\\n\"\n            for chapter in summary_data['timestamps']:\n                time_str = self._format_time(chapter['start'])\n                txt += f\"{time_str} - {chapter['title']}\\n\"\n                \n        return txt\n    \n    def to_pdf(self, summary_data):\n        \"\"\"Convert summary to PDF format\"\"\"\n        from reportlab.lib.pagesizes import letter\n        from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer\n        from reportlab.lib.styles import getSampleStyleSheet\n        from io import BytesIO\n        \n        buffer = BytesIO()\n        doc = SimpleDocTemplate(buffer, pagesize=letter)\n        styles = getSampleStyleSheet()\n        \n        story = []\n        \n        # Title\n        story.append(Paragraph(f\"Summary of: {summary_data['title']}\", styles['Title']))\n        story.append(Spacer(1, 12))\n        \n        # Video URL\n        story.append(Paragraph(f\"Video URL: https://www.youtube.com/watch?v={summary_data['video_id']}\", styles['Normal']))\n        story.append(Spacer(1, 12))\n        \n        # Summary\n        story.append(Paragraph(\"SUMMARY:\", styles['Heading2']))\n        story.append(Spacer(1, 6))\n        \n        # Split summary by paragraphs\n        paragraphs = summary_data['summary'].split('\\n')\n        for p in paragraphs:\n            if p.strip():\n                story.append(Paragraph(p, styles['Normal']))\n                story.append(Spacer(1, 6))\n        \n        # Chapters if available\n        if 'timestamps' in summary_data and summary_data['timestamps']:\n            story.append(Spacer(1, 12))\n            story.append(Paragraph(\"CHAPTERS:\", styles['Heading2']))\n            story.append(Spacer(1, 6))\n            \n            for chapter in summary_data['timestamps']:\n                time_str = self._format_time(chapter['start'])\n                story.append(Paragraph(f\"{time_str} - {chapter['title']}\", styles['Normal']))\n        \n        doc.build(story)\n        pdf_data = buffer.getvalue()\n        buffer.close()\n        \n        return pdf_data\n    \n    def _format_time(self, seconds):\n        \"\"\"Format seconds to HH:MM:SS\"\"\"\n        m, s = divmod(int(seconds), 60)\n        h, m = divmod(m, 60)\n        return f\"{h:02d}:{m:02d}:{s:02d}\"\n    \n    def _create_timestamp_url(self, video_id, seconds):\n        \"\"\"Create YouTube URL with timestamp\"\"\"\n        return f\"https://www.youtube.com/watch?v={video_id}&t={int(seconds)}s\"\n```\n\n2. Add API endpoints for export:\n```python\n@app.get(\"/api/export/{format}/{video_id}\")\nasync def export_summary(format: str, video_id: str):\n    if format not in [\"markdown\", \"txt\", \"pdf\"]:\n        raise HTTPException(status_code=400, detail=\"Unsupported export format\")\n    \n    # Get summary from cache/database\n    cache_service = CacheService()\n    summary_record = cache_service.get_summary_by_video_id(video_id)\n    \n    if not summary_record:\n        raise HTTPException(status_code=404, detail=\"Summary not found\")\n    \n    export_service = ExportService()\n    \n    summary_data = {\n        \"title\": summary_record.title or \"YouTube Video\",\n        \"video_id\": video_id,\n        \"summary\": summary_record.summary,\n    }\n    \n    # Add timestamps if available\n    if summary_record.timestamps:\n        summary_data[\"timestamps\"] = summary_record.timestamps\n    \n    if format == \"markdown\":\n        content = export_service.to_markdown(summary_data)\n        return Response(content=content, media_type=\"text/markdown\")\n    elif format == \"txt\":\n        content = export_service.to_txt(summary_data)\n        return Response(content=content, media_type=\"text/plain\")\n    elif format == \"pdf\":\n        content = export_service.to_pdf(summary_data)\n        return Response(content=content, media_type=\"application/pdf\")\n```\n\n3. Implement frontend for export options\n4. Add copy-to-clipboard functionality\n5. Implement download handling for exported files\n6. Add preview functionality for different export formats",
        "testStrategy": "1. Unit tests for each export format\n2. Test export with various summary content (long, short, with/without timestamps)\n3. Test PDF generation with different content types\n4. Test copy-to-clipboard functionality across browsers\n5. Test download functionality for different file types\n6. Verify formatting in exported files matches expectations\n7. Test with special characters and different languages",
        "priority": "medium",
        "dependencies": [
          5,
          6,
          8
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 10,
        "title": "Implement Summary History and User Preferences",
        "description": "Create functionality to store and retrieve summary history and implement user preferences for default settings.",
        "details": "1. Extend database models for history and preferences:\n```python\nclass UserPreference(Base):\n    __tablename__ = \"user_preferences\"\n    \n    id = Column(Integer, primary_key=True, index=True)\n    user_id = Column(String, unique=True, index=True)  # Could be IP or session ID for anonymous users\n    default_model = Column(String, default=\"openai\")\n    default_length = Column(String, default=\"medium\")\n    default_style = Column(String, default=\"standard\")\n    default_focus = Column(String, default=\"general\")\n    theme = Column(String, default=\"light\")\n    created_at = Column(DateTime, default=datetime.datetime.utcnow)\n    updated_at = Column(DateTime, default=datetime.datetime.utcnow, onupdate=datetime.datetime.utcnow)\n\nclass SummaryHistory(Base):\n    __tablename__ = \"summary_history\"\n    \n    id = Column(Integer, primary_key=True, index=True)\n    user_id = Column(String, index=True)  # Could be IP or session ID for anonymous users\n    video_id = Column(String, index=True)\n    title = Column(String)\n    timestamp = Column(DateTime, default=datetime.datetime.utcnow)\n    settings_used = Column(String)  # JSON string of settings\n```\n\n2. Implement history service:\n```python\nimport json\n\nclass HistoryService:\n    def __init__(self):\n        self.db = SessionLocal()\n    \n    def add_to_history(self, user_id, video_id, title, settings):\n        \"\"\"Add a summary to user's history\"\"\"\n        history_item = SummaryHistory(\n            user_id=user_id,\n            video_id=video_id,\n            title=title,\n            settings_used=json.dumps(settings)\n        )\n        \n        self.db.add(history_item)\n        self.db.commit()\n        \n        return history_item.id\n    \n    def get_user_history(self, user_id, limit=10, offset=0):\n        \"\"\"Get user's summary history\"\"\"\n        history = self.db.query(SummaryHistory)\\\n            .filter(SummaryHistory.user_id == user_id)\\\n            .order_by(SummaryHistory.timestamp.desc())\\\n            .offset(offset)\\\n            .limit(limit)\\\n            .all()\n            \n        return [\n            {\n                \"id\": item.id,\n                \"video_id\": item.video_id,\n                \"title\": item.title,\n                \"timestamp\": item.timestamp.isoformat(),\n                \"settings\": json.loads(item.settings_used)\n            }\n            for item in history\n        ]\n    \n    def clear_history(self, user_id):\n        \"\"\"Clear user's history\"\"\"\n        self.db.query(SummaryHistory)\\\n            .filter(SummaryHistory.user_id == user_id)\\\n            .delete()\n        self.db.commit()\n    \n    def close(self):\n        self.db.close()\n```\n\n3. Implement preferences service:\n```python\nclass PreferencesService:\n    def __init__(self):\n        self.db = SessionLocal()\n    \n    def get_user_preferences(self, user_id):\n        \"\"\"Get user preferences or create default if not exists\"\"\"\n        prefs = self.db.query(UserPreference)\\\n            .filter(UserPreference.user_id == user_id)\\\n            .first()\n            \n        if not prefs:\n            # Create default preferences\n            prefs = UserPreference(user_id=user_id)\n            self.db.add(prefs)\n            self.db.commit()\n            \n        return {\n            \"default_model\": prefs.default_model,\n            \"default_length\": prefs.default_length,\n            \"default_style\": prefs.default_style,\n            \"default_focus\": prefs.default_focus,\n            \"theme\": prefs.theme\n        }\n    \n    def update_preferences(self, user_id, preferences):\n        \"\"\"Update user preferences\"\"\"\n        prefs = self.db.query(UserPreference)\\\n            .filter(UserPreference.user_id == user_id)\\\n            .first()\n            \n        if not prefs:\n            prefs = UserPreference(user_id=user_id)\n            self.db.add(prefs)\n            \n        # Update fields\n        if \"default_model\" in preferences:\n            prefs.default_model = preferences[\"default_model\"]\n        if \"default_length\" in preferences:\n            prefs.default_length = preferences[\"default_length\"]\n        if \"default_style\" in preferences:\n            prefs.default_style = preferences[\"default_style\"]\n        if \"default_focus\" in preferences:\n            prefs.default_focus = preferences[\"default_focus\"]\n        if \"theme\" in preferences:\n            prefs.theme = preferences[\"theme\"]\n            \n        self.db.commit()\n        \n        return self.get_user_preferences(user_id)\n    \n    def close(self):\n        self.db.close()\n```\n\n4. Add API endpoints for history and preferences\n5. Implement frontend for viewing history\n6. Add preferences UI in settings page\n7. Implement session management for anonymous users",
        "testStrategy": "1. Unit tests for history and preferences services\n2. Test history retrieval with pagination\n3. Test preferences persistence\n4. Test anonymous user session handling\n5. Test history clearing functionality\n6. Test UI for history display\n7. Test preferences application to new summaries\n8. Test with multiple concurrent users",
        "priority": "medium",
        "dependencies": [
          5,
          6
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 11,
        "title": "Implement Security Features and Rate Limiting",
        "description": "Add security features including API key management, rate limiting, input sanitization, and CORS configuration.",
        "details": "1. Implement API key management:\n```python\nfrom fastapi import Security, Depends, HTTPException\nfrom fastapi.security.api_key import APIKeyHeader\nfrom starlette.status import HTTP_403_FORBIDDEN\nimport os\nfrom datetime import datetime, timedelta\n\n# For internal API endpoints that need protection\nAPI_KEY_NAME = \"X-API-Key\"\nAPI_KEY = os.getenv(\"API_KEY\", \"\")\n\napi_key_header = APIKeyHeader(name=API_KEY_NAME, auto_error=False)\n\nasync def get_api_key(api_key_header: str = Security(api_key_header)):\n    if not API_KEY:\n        # No API key set, so no protection\n        return True\n        \n    if api_key_header == API_KEY:\n        return True\n    else:\n        raise HTTPException(\n            status_code=HTTP_403_FORBIDDEN, detail=\"Invalid API Key\"\n        )\n```\n\n2. Implement rate limiting:\n```python\nfrom fastapi import Request\nimport time\nfrom collections import defaultdict\n\nclass RateLimiter:\n    def __init__(self, requests_per_minute=30):\n        self.requests_per_minute = requests_per_minute\n        self.requests = defaultdict(list)  # IP -> list of timestamps\n        \n    async def check(self, request: Request):\n        client_ip = request.client.host\n        now = time.time()\n        \n        # Remove timestamps older than 1 minute\n        self.requests[client_ip] = [ts for ts in self.requests[client_ip] if now - ts < 60]\n        \n        # Check if rate limit exceeded\n        if len(self.requests[client_ip]) >= self.requests_per_minute:\n            return False\n            \n        # Add current request timestamp\n        self.requests[client_ip].append(now)\n        return True\n\nrate_limiter = RateLimiter()\n\n@app.middleware(\"http\")\nasync def rate_limiting_middleware(request: Request, call_next):\n    # Skip rate limiting for static files\n    if request.url.path.startswith(\"/static\"):\n        return await call_next(request)\n        \n    # Check rate limit\n    if not await rate_limiter.check(request):\n        return JSONResponse(\n            status_code=429,\n            content={\"detail\": \"Rate limit exceeded. Please try again later.\"}\n        )\n        \n    return await call_next(request)\n```\n\n3. Configure CORS:\n```python\nfrom fastapi.middleware.cors import CORSMiddleware\n\norigins = [\n    \"http://localhost\",\n    \"http://localhost:8000\",\n    \"http://localhost:3000\",  # For frontend development\n    \"https://yoursummarizerapp.com\",  # Production domain\n]\n\napp.add_middleware(\n    CORSMiddleware,\n    allow_origins=origins,\n    allow_credentials=True,\n    allow_methods=[\"*\"],\n    allow_headers=[\"*\"],\n)\n```\n\n4. Implement input sanitization:\n```python\nimport re\nfrom html import escape\n\ndef sanitize_input(text):\n    \"\"\"Sanitize user input to prevent XSS\"\"\"\n    if not text:\n        return \"\"\n        \n    # HTML escape\n    text = escape(text)\n    \n    # Remove potentially dangerous patterns\n    text = re.sub(r'javascript:', '', text, flags=re.IGNORECASE)\n    text = re.sub(r'data:', '', text, flags=re.IGNORECASE)\n    \n    return text\n\n# Use in API endpoints\n@app.post(\"/api/summarize\")\nasync def summarize_video(request: VideoRequest):\n    # Sanitize URL\n    sanitized_url = sanitize_input(request.url)\n    # Continue with processing\n```\n\n5. Implement secure headers middleware:\n```python\n@app.middleware(\"http\")\nasync def security_headers_middleware(request: Request, call_next):\n    response = await call_next(request)\n    \n    # Add security headers\n    response.headers[\"X-Content-Type-Options\"] = \"nosniff\"\n    response.headers[\"X-Frame-Options\"] = \"DENY\"\n    response.headers[\"X-XSS-Protection\"] = \"1; mode=block\"\n    response.headers[\"Strict-Transport-Security\"] = \"max-age=31536000; includeSubDomains\"\n    response.headers[\"Content-Security-Policy\"] = \"default-src 'self'; script-src 'self'; style-src 'self'; img-src 'self' data: https://i.ytimg.com; connect-src 'self' https://api.openai.com https://api.anthropic.com;\"\n    \n    return response\n```\n\n6. Implement SQL injection prevention in database queries\n7. Add request logging for security monitoring",
        "testStrategy": "1. Test rate limiting with concurrent requests\n2. Test API key authentication\n3. Test CORS with requests from different origins\n4. Test input sanitization with malicious inputs\n5. Test security headers in responses\n6. Perform security scanning with tools like OWASP ZAP\n7. Test SQL injection prevention\n8. Test XSS prevention with various attack vectors",
        "priority": "high",
        "dependencies": [
          5,
          6
        ],
        "status": "pending",
        "subtasks": []
      },
      {
        "id": 12,
        "title": "Implement Comprehensive Error Handling and Logging",
        "description": "Develop a robust error handling system with user-friendly messages and comprehensive logging for debugging and monitoring.",
        "details": "1. Set up logging configuration:\n```python\nimport logging\nimport sys\nfrom pathlib import Path\n\ndef setup_logging(log_level=logging.INFO):\n    # Create logs directory if it doesn't exist\n    log_dir = Path(\"logs\")\n    log_dir.mkdir(exist_ok=True)\n    \n    # Configure root logger\n    logging.basicConfig(\n        level=log_level,\n        format=\"%(asctime)s - %(name)s - %(levelname)s - %(message)s\",\n        handlers=[\n            logging.StreamHandler(sys.stdout),\n            logging.FileHandler(log_dir / \"app.log\"),\n        ]\n    )\n    \n    # Configure specific loggers\n    loggers = [\n        \"uvicorn\",\n        \"uvicorn.error\",\n        \"uvicorn.access\",\n        \"fastapi\",\n        \"app\",  # Our application logger\n    ]\n    \n    for logger_name in loggers:\n        logger = logging.getLogger(logger_name)\n        logger.setLevel(log_level)\n    \n    return logging.getLogger(\"app\")\n\n# Create application logger\nlogger = setup_logging()\n```\n\n2. Implement error handling middleware:\n```python\nfrom fastapi import Request, status\nfrom fastapi.responses import JSONResponse\nimport traceback\nimport uuid\n\nclass AppException(Exception):\n    \"\"\"Base exception for application-specific errors\"\"\"\n    def __init__(self, status_code: int, detail: str, error_code: str = None):\n        self.status_code = status_code\n        self.detail = detail\n        self.error_code = error_code or \"UNKNOWN_ERROR\"\n\n@app.exception_handler(AppException)\nasync def app_exception_handler(request: Request, exc: AppException):\n    return JSONResponse(\n        status_code=exc.status_code,\n        content={\n            \"detail\": exc.detail,\n            \"error_code\": exc.error_code,\n        }\n    )\n\n@app.exception_handler(Exception)\nasync def unhandled_exception_handler(request: Request, exc: Exception):\n    # Generate unique error ID for tracking\n    error_id = str(uuid.uuid4())\n    \n    # Log the error with traceback\n    logger.error(\n        f\"Unhandled exception: {str(exc)}\\nError ID: {error_id}\\nPath: {request.url.path}\",\n        exc_info=True\n    )\n    \n    # Return user-friendly error\n    return JSONResponse(\n        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,\n        content={\n            \"detail\": \"An unexpected error occurred. Our team has been notified.\",\n            \"error_id\": error_id,\n        }\n    )\n```\n\n3. Create specific error types:\n```python\nclass YouTubeError(AppException):\n    \"\"\"Errors related to YouTube operations\"\"\"\n    def __init__(self, detail: str, status_code: int = status.HTTP_400_BAD_REQUEST):\n        super().__init__(status_code, detail, \"YOUTUBE_ERROR\")\n\nclass TranscriptError(AppException):\n    \"\"\"Errors related to transcript extraction\"\"\"\n    def __init__(self, detail: str, status_code: int = status.HTTP_400_BAD_REQUEST):\n        super().__init__(status_code, detail, \"TRANSCRIPT_ERROR\")\n\nclass AIModelError(AppException):\n    \"\"\"Errors related to AI model operations\"\"\"\n    def __init__(self, detail: str, status_code: int = status.HTTP_500_INTERNAL_SERVER_ERROR):\n        super().__init__(status_code, detail, \"AI_MODEL_ERROR\")\n\nclass RateLimitError(AppException):\n    \"\"\"Rate limit exceeded errors\"\"\"\n    def __init__(self, detail: str = \"Rate limit exceeded. Please try again later.\"):\n        super().__init__(status.HTTP_429_TOO_MANY_REQUESTS, detail, \"RATE_LIMIT_ERROR\")\n```\n\n4. Implement structured logging:\n```python\ndef log_api_request(request_id, user_id, endpoint, params, duration_ms, status_code):\n    \"\"\"Log API request details\"\"\"\n    logger.info(\n        \"API Request\",\n        extra={\n            \"request_id\": request_id,\n            \"user_id\": user_id,\n            \"endpoint\": endpoint,\n            \"params\": params,\n            \"duration_ms\": duration_ms,\n            \"status_code\": status_code,\n        }\n    )\n\ndef log_summary_generation(request_id, video_id, model, duration_ms, token_count, success):\n    \"\"\"Log summary generation details\"\"\"\n    logger.info(\n        \"Summary Generation\",\n        extra={\n            \"request_id\": request_id,\n            \"video_id\": video_id,\n            \"model\": model,\n            \"duration_ms\": duration_ms,\n            \"token_count\": token_count,\n            \"success\": success,\n        }\n    )\n\ndef log_error(error_type, message, details=None):\n    \"\"\"Log application errors\"\"\"\n    logger.error(\n        f\"{error_type}: {message}\",\n        extra={\n            \"error_type\": error_type,\n            \"details\": details,\n        }\n    )\n```\n\n5. Implement request ID tracking:\n```python\n@app.middleware(\"http\")\nasync def request_middleware(request: Request, call_next):\n    # Generate request ID\n    request_id = str(uuid.uuid4())\n    request.state.request_id = request_id\n    \n    # Get user ID (could be from session, IP, etc.)\n    user_id = request.client.host  # Using IP as user ID for now\n    request.state.user_id = user_id\n    \n    # Measure request duration\n    start_time = time.time()\n    \n    # Process request\n    try:\n        response = await call_next(request)\n        \n        # Calculate duration\n        duration_ms = round((time.time() - start_time) * 1000)\n        \n        # Log request\n        log_api_request(\n            request_id=request_id,\n            user_id=user_id,\n            endpoint=request.url.path,\n            params=dict(request.query_params),\n            duration_ms=duration_ms,\n            status_code=response.status_code\n        )\n        \n        # Add request ID to response headers\n        response.headers[\"X-Request-ID\"] = request_id\n        \n        return response\n    except Exception as e:\n        # Let the exception handlers deal with it\n        raise\n```\n\n6. Implement user-friendly error messages in the frontend\n7. Add monitoring for error rates and performance metrics",
        "testStrategy": "1. Test error handling for various error scenarios\n2. Verify log output format and content\n3. Test request ID propagation through the system\n4. Test custom exception handling\n5. Verify user-friendly error messages in the UI\n6. Test logging performance impact\n7. Verify error tracking and correlation between frontend and backend errors\n8. Test log rotation and management",
        "priority": "high",
        "dependencies": [
          1,
          5
        ],
        "status": "pending",
        "subtasks": []
      }
    ],
    "metadata": {
      "created": "2025-08-25T02:19:35.583Z",
      "updated": "2025-08-25T02:19:35.583Z",
      "description": "Tasks for master context"
    }
  }
}