youtube-summarizer/docs/stories/4.3.multi-video-multi-agent...

15 KiB

Story 4.3: Multi-video Analysis with Multi-Agent System

Story Overview

Story ID: 4.3
Epic: 4 - Advanced Intelligence & Developer Platform
Title: Multi-video Analysis with Multi-Agent System
Status: 📋 READY FOR IMPLEMENTATION
Priority: High

Goal: Implement playlist and multi-video analysis using a multi-agent AI system with three different perspective agents (Technical, Business, User) and a synthesis agent for unified comprehensive summaries.

Value Proposition: Provide comprehensive multi-faceted analysis of YouTube playlists and channels, leveraging existing AI ecosystem infrastructure to generate insights from multiple AI perspectives.

Dependencies:

  • Story 4.2 (API Endpoints & Developer SDK) - Complete
  • Existing AI ecosystem at /src/agents/ecosystem/
  • ChromaDB integration patterns from /tests/framework-comparison/

Estimated Effort: 40 hours (enhanced with multi-agent system)

Technical Requirements

Core Features

1. Multi-Agent Summarization System

  • Three Perspective Agents: Technical Analyst, Business Analyst, User Experience Analyst
  • Synthesis Agent: Combines perspectives into unified comprehensive summary
  • AI Ecosystem Integration: Leverages existing /src/agents/ecosystem/ infrastructure
  • Agent Orchestration: Uses BaseAgent and AgentOrchestrator patterns
  • State Management: LangGraph-based workflow coordination

2. Playlist Processing

  • Playlist URL Parser: Extract video IDs from YouTube playlist URLs
  • Batch Video Processing: Process all videos in playlist sequentially
  • Cross-Video Analysis: Identify themes, patterns, and progression across videos
  • Series Tracking: Detect content evolution and learning paths

3. Multi-Video Intelligence

  • Theme Analysis: Identify common topics across multiple videos
  • Content Evolution: Track how topics develop across series
  • Channel Analysis: Comprehensive channel content analysis
  • Trend Detection: Identify patterns in content creation and topics

4. Enhanced Export Features

  • Multi-Agent Summaries: Export includes all three perspective summaries plus synthesis
  • Cross-Video Insights: Aggregated insights from playlist analysis
  • Comparison Reports: Side-by-side analysis of videos in series
  • Bulk Export: Export all playlist summaries with agent analysis

Technical Architecture

Multi-Agent System Components

# Agent Perspective Definitions
class PerspectiveAgent:
    TECHNICAL = "technical"     # Focus on technical concepts, implementation, tools
    BUSINESS = "business"       # Focus on business value, ROI, market implications  
    USER_EXPERIENCE = "user"    # Focus on user journey, usability, accessibility
    SYNTHESIS = "synthesis"     # Combines all perspectives into unified view

# Agent Orchestration
class PlaylistAnalysisOrchestrator(BaseAgent):
    def __init__(self):
        self.technical_agent = TechnicalAnalysisAgent()
        self.business_agent = BusinessAnalysisAgent()
        self.ux_agent = UserExperienceAgent()
        self.synthesis_agent = SynthesisAgent()
    
    async def analyze_playlist(self, playlist_url: str) -> MultiAgentAnalysisResult:
        # Extract videos and process with multiple agents
        # Orchestrate agent communication and synthesis

Database Schema Extensions

-- Multi-agent analysis storage
CREATE TABLE agent_summaries (
    id UUID PRIMARY KEY,
    summary_id UUID REFERENCES summaries(id),
    agent_type VARCHAR(20), -- 'technical', 'business', 'user', 'synthesis'
    agent_summary TEXT,
    key_insights JSONB,
    confidence_score FLOAT,
    processing_time_seconds FLOAT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Playlist analysis
CREATE TABLE playlists (
    id UUID PRIMARY KEY,
    playlist_id VARCHAR(50),
    playlist_url TEXT,
    title VARCHAR(500),
    channel_name VARCHAR(200),
    video_count INTEGER,
    total_duration INTEGER,
    analyzed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Cross-video analysis
CREATE TABLE playlist_analysis (
    id UUID PRIMARY KEY,
    playlist_id UUID REFERENCES playlists(id),
    themes JSONB,
    content_progression JSONB,
    key_insights JSONB,
    agent_perspectives JSONB,
    synthesis_summary TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Implementation Tasks

Task 4.3.1: Multi-Agent System Integration (12 hours)

Subtasks:

  1. Agent Interface Setup (3 hours)

    • Import and configure BaseAgent from /src/agents/ecosystem/
    • Set up AgentOrchestrator for workflow management
    • Configure DeepSeek integration for all agents (no Anthropic)
    • Test agent communication and state management
  2. Perspective Agent Implementation (6 hours)

    • Create TechnicalAnalysisAgent with technical focus prompts
    • Create BusinessAnalysisAgent with business value prompts
    • Create UserExperienceAgent with UX/accessibility prompts
    • Implement agent-specific prompt templates and analysis patterns
  3. Synthesis Agent Development (3 hours)

    • Create SynthesisAgent to combine multiple perspectives
    • Implement perspective weighting and conflict resolution
    • Design unified summary generation logic
    • Test multi-agent coordination and output quality

Task 4.3.2: Playlist Processing Engine (16 hours)

Subtasks:

  1. Playlist URL Parser (4 hours)

    • Extract playlist ID from various YouTube playlist URL formats
    • Validate playlist accessibility and permissions
    • Handle private/unlisted playlist edge cases
    • Integrate with YouTube Data API for metadata
  2. Video Discovery Service (4 hours)

    • Fetch all video IDs from playlist using YouTube Data API
    • Extract video metadata (title, duration, upload date)
    • Handle pagination for large playlists (100+ videos)
    • Implement error handling for deleted/private videos
  3. Batch Processing Pipeline (6 hours)

    • Process videos sequentially with progress tracking
    • Integrate with existing SummaryPipeline for individual videos
    • Implement multi-agent analysis for each video
    • Handle failures and retry logic for individual videos
  4. Cross-Video Analysis Engine (2 hours)

    • Identify common themes across multiple videos
    • Track content progression and evolution
    • Generate playlist-level insights and patterns
    • Create aggregated analysis from agent perspectives

Task 4.3.3: API Endpoints and Integration (8 hours)

Subtasks:

  1. Multi-Agent API Endpoints (4 hours)

    • POST /api/analysis/multi-agent/{video_id} - Analyze single video with all agents
    • GET /api/analysis/agent-perspectives/{summary_id} - Get all agent perspectives
    • POST /api/analysis/playlist - Start playlist analysis
    • GET /api/analysis/playlist/{job_id}/status - Monitor playlist processing
  2. Frontend Integration (3 hours)

    • Create MultiAgentAnalysisView component
    • Add playlist URL input and validation
    • Display agent perspectives in tabbed interface
    • Show synthesis summary with highlighting from each perspective
  3. Background Job Management (1 hour)

    • Integrate playlist processing with existing job system
    • Add WebSocket progress updates for multi-video processing
    • Implement job cancellation for long-running playlist analysis
    • Add monitoring and statistics endpoints

Task 4.3.4: Enhanced Export and Reporting (4 hours)

Subtasks:

  1. Multi-Agent Export Formats (2 hours)

    • Enhanced markdown with agent perspective sections
    • JSON export with structured agent analysis data
    • CSV export for playlist comparison and analysis
    • Add agent analysis to existing PDF export
  2. Playlist Reports (2 hours)

    • Generate comprehensive playlist analysis reports
    • Include cross-video insights and theme analysis
    • Create comparison tables for video progression
    • Add executive summary with key playlist insights

Data Models

Multi-Agent Analysis Models

from pydantic import BaseModel
from typing import List, Dict, Optional, Any
from enum import Enum

class AgentType(str, Enum):
    TECHNICAL = "technical"
    BUSINESS = "business"  
    USER_EXPERIENCE = "user"
    SYNTHESIS = "synthesis"

class AgentPerspective(BaseModel):
    agent_type: AgentType
    summary: str
    key_insights: List[str]
    confidence_score: float
    focus_areas: List[str]
    recommendations: List[str]

class MultiAgentAnalysisResult(BaseModel):
    video_id: str
    perspectives: List[AgentPerspective]
    synthesis_summary: str
    unified_insights: List[str]
    processing_time_seconds: float
    
class PlaylistAnalysisRequest(BaseModel):
    playlist_url: str
    include_cross_video_analysis: bool = True
    agent_types: List[AgentType] = [AgentType.TECHNICAL, AgentType.BUSINESS, AgentType.USER_EXPERIENCE]
    
class PlaylistAnalysisResult(BaseModel):
    playlist_id: str
    video_summaries: List[MultiAgentAnalysisResult]
    cross_video_analysis: Dict[str, Any]
    playlist_insights: List[str]
    themes: List[str]
    content_progression: Dict[str, Any]

Testing Strategy

Unit Tests

  • Agent Integration Tests: Test BaseAgent and orchestrator integration
  • Perspective Analysis Tests: Validate each agent's analysis quality
  • Synthesis Logic Tests: Test perspective combination and conflict resolution
  • Playlist Parser Tests: Validate URL parsing and video discovery
  • Cross-Video Analysis Tests: Test theme detection and progression analysis

Integration Tests

  • Multi-Agent API Tests: Test all new API endpoints
  • Playlist Processing Tests: End-to-end playlist analysis workflow
  • Database Integration Tests: Validate agent_summaries and playlist schema
  • Export Format Tests: Test enhanced export with agent perspectives

Performance Tests

  • Large Playlist Tests: Test with 50+ video playlists
  • Concurrent Analysis Tests: Multiple playlists processing simultaneously
  • Agent Response Time Tests: Measure individual agent processing times
  • Memory Usage Tests: Monitor resource usage during large batch processing

API Specification

Multi-Agent Analysis Endpoints

/api/analysis/multi-agent/{video_id}:
  post:
    summary: Analyze video with multiple agent perspectives
    parameters:
      - name: video_id
        in: path
        required: true
        schema:
          type: string
    requestBody:
      required: true
      content:
        application/json:
          schema:
            type: object
            properties:
              agent_types:
                type: array
                items:
                  type: string
                  enum: [technical, business, user]
              include_synthesis:
                type: boolean
                default: true
    responses:
      200:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/MultiAgentAnalysisResult'

/api/analysis/playlist:
  post:
    summary: Start playlist analysis with multi-agent system
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/PlaylistAnalysisRequest'
    responses:
      202:
        content:
          application/json:
            schema:
              type: object
              properties:
                job_id:
                  type: string
                status:
                  type: string
                  enum: [pending, processing]
                estimated_completion_time:
                  type: string
                  format: date-time

Success Criteria

Functional Requirements

  • Multi-agent system analyzes videos with 3 different perspectives (Technical, Business, UX)
  • Synthesis agent combines perspectives into unified comprehensive summary
  • Playlist URL parser extracts and validates playlist information
  • Batch processing handles playlists of 20+ videos
  • Cross-video analysis identifies themes and content progression
  • Enhanced export includes all agent perspectives and synthesis
  • API endpoints support multi-agent analysis and playlist processing

Quality Requirements

  • Agent perspectives show distinct focus areas and insights
  • Synthesis summary effectively combines perspectives without redundancy
  • Processing time under 45 seconds per video for multi-agent analysis
  • Cross-video analysis provides meaningful insights for playlists
  • Export formats maintain readability with multi-agent content
  • Error handling gracefully manages failed videos in playlists

Performance Requirements

  • Handle playlists up to 100 videos
  • Process 3 agent perspectives + synthesis in under 60 seconds per video
  • Support concurrent playlist processing (2-3 simultaneous jobs)
  • Memory usage remains stable during large playlist processing
  • WebSocket progress updates every 10 seconds during processing

Implementation Notes

AI Ecosystem Integration

  • Use existing BaseAgent class from /src/agents/ecosystem/core/base_agent.py
  • Leverage AgentOrchestrator from /src/agents/ecosystem/orchestration/orchestrator.py
  • Follow LangGraph patterns from /tests/framework-comparison/test_langgraph_chromadb.py
  • Implement DeepSeek integration for all agents (avoid Anthropic per user requirements)

Agent Perspective Design

  • Technical Agent: Focus on implementation details, tools, technical concepts, architecture
  • Business Agent: Analyze ROI, market implications, business value, competitive analysis
  • UX Agent: Evaluate user journey, accessibility, usability, user experience patterns
  • Synthesis Agent: Combine insights, resolve conflicts, create unified narrative

Cross-Video Analysis Patterns

  • Theme extraction using semantic similarity across video summaries
  • Content progression tracking through timestamp and topic analysis
  • Series detection through title patterns and content similarity
  • Channel analysis through upload patterns and topic evolution

Performance Considerations

  • Parallel agent processing where possible (Technical, Business, UX can run concurrently)
  • Intelligent caching of agent analyses for repeated playlist processing
  • Batch database operations for playlist video processing
  • WebSocket connection management for long-running playlist jobs

Risk Mitigation

High Risk: Agent Analysis Quality

  • Risk: Agents provide similar or redundant perspectives
  • Mitigation: Distinct prompt engineering with clear focus areas, quality scoring, and validation

Medium Risk: Processing Time

  • Risk: Multi-agent analysis too slow for user experience
  • Mitigation: Parallel agent processing, progress indicators, background processing

Medium Risk: Large Playlist Performance

  • Risk: Memory/resource exhaustion with 50+ video playlists
  • Mitigation: Batch processing limits, resource monitoring, job queuing system

Story Owner: Development Team
Architecture Reference: BMad Method Epic-Story Structure
Implementation Status: Ready for Development
Last Updated: 2025-08-27