193 lines
6.4 KiB
Plaintext
193 lines
6.4 KiB
Plaintext
# Clean-Tracks Audio Censorship System - Product Requirements Document
|
|
|
|
## Executive Summary
|
|
Clean-Tracks is an intelligent audio processing tool that automatically detects and censors explicit content in audio files. It features both a user-friendly web interface and powerful command-line tools, leveraging OpenAI Whisper for accurate speech recognition.
|
|
|
|
## Project Goals
|
|
- Build a reliable audio censorship system that processes files in under 30% of their duration
|
|
- Create an intuitive web UI that allows users to process audio with minimal technical knowledge
|
|
- Provide customizable word lists that users can manage and control
|
|
- Support multiple audio formats (MP3, WAV, FLAC, M4A, OGG)
|
|
- Achieve 95%+ accuracy in explicit word detection
|
|
- Ensure WCAG AA accessibility compliance
|
|
|
|
## Target Users
|
|
1. Content Creators - Need to quickly clean podcast/video audio for wider distribution
|
|
2. Educators - Make educational content appropriate for classroom use
|
|
3. Parents - Clean music and media for family consumption
|
|
4. Media Professionals - Prepare content for broadcast standards
|
|
|
|
## Core Features
|
|
|
|
### 1. Audio Processing Engine
|
|
- Integrate OpenAI Whisper for speech-to-text with word-level timestamps
|
|
- Support multiple Whisper model sizes (tiny, base, small, medium, large)
|
|
- Process audio files up to 500MB
|
|
- Handle long-form content with intelligent chunking
|
|
- Preserve audio quality during processing
|
|
- Support GPU acceleration for faster processing
|
|
|
|
### 2. Word Detection System
|
|
- Implement exact and fuzzy word matching using Levenshtein distance
|
|
- Add phonetic matching for similar-sounding words
|
|
- Context-aware detection to reduce false positives
|
|
- Confidence scoring for each detection
|
|
- Support for multiple languages (initially English)
|
|
|
|
### 3. Censorship Methods
|
|
- Silence insertion (mute detected words)
|
|
- Beep tone generation (customizable frequency)
|
|
- White noise replacement
|
|
- Fade in/out transitions for smooth audio
|
|
- Reversible censorship with undo capability
|
|
|
|
### 4. Word List Management
|
|
- User-controlled explicit word lists
|
|
- Severity levels (mild, moderate, severe)
|
|
- Category organization (profanity, slurs, custom)
|
|
- Import/export functionality (CSV, JSON)
|
|
- Shareable community word lists
|
|
- Default word lists for common use cases
|
|
|
|
### 5. Web User Interface
|
|
- Drag-and-drop file upload with Dropzone.js
|
|
- Real-time processing progress with WebSocket updates
|
|
- Audio waveform visualization showing detected words
|
|
- Interactive word list management interface
|
|
- Mobile-responsive design with touch optimization
|
|
- First-time user onboarding with tutorial
|
|
- Before/after preview player
|
|
|
|
### 6. Command-Line Interface
|
|
- Process single files: clean-tracks process <file>
|
|
- Batch processing: clean-tracks batch <pattern>
|
|
- Word list management: clean-tracks words <command>
|
|
- Configuration: clean-tracks config <setting>
|
|
- Server mode: clean-tracks server
|
|
|
|
### 7. Batch Processing
|
|
- Queue management for multiple files
|
|
- Priority system for processing order
|
|
- Parallel processing capability
|
|
- Progress tracking for all jobs
|
|
- Automatic retry on failures
|
|
|
|
### 8. Performance & Optimization
|
|
- Result caching to avoid reprocessing
|
|
- GPU acceleration support
|
|
- Lazy loading for web interface
|
|
- CDN for static assets
|
|
- Code splitting for faster initial load
|
|
|
|
## Technical Requirements
|
|
|
|
### Backend Stack
|
|
- Python 3.11+ for core processing
|
|
- Flask web framework
|
|
- SQLAlchemy for database ORM
|
|
- OpenAI Whisper for transcription
|
|
- PyDub and FFmpeg for audio manipulation
|
|
- Librosa for audio analysis
|
|
- Socket.io for real-time updates
|
|
|
|
### Frontend Stack
|
|
- HTML5 with semantic markup
|
|
- Bootstrap 5 for responsive design
|
|
- Vanilla JavaScript with modern ES6+
|
|
- WaveSurfer.js for waveform visualization
|
|
- DataTables.js for word list management
|
|
- Dropzone.js for file uploads
|
|
|
|
### Infrastructure
|
|
- SQLite database for word lists and history
|
|
- Redis for caching (optional)
|
|
- Celery for background jobs (optional)
|
|
- Docker for containerization
|
|
- GitHub Actions for CI/CD
|
|
|
|
## User Experience Requirements
|
|
|
|
### Onboarding Flow
|
|
1. Welcome modal with 3-step visual guide
|
|
2. Sample audio file for testing
|
|
3. Interactive tooltip tour
|
|
4. Quick start guide
|
|
|
|
### Key User Journeys
|
|
1. Upload → Process → Review → Download (< 3 minutes)
|
|
2. Manage Word Lists → Add/Edit/Remove words
|
|
3. Configure Settings → Choose censorship style
|
|
4. Batch Process → Queue multiple files
|
|
|
|
### Design Principles
|
|
- Simplicity first - complex features hidden by default
|
|
- Visual feedback for all actions
|
|
- Clear error messages with solutions
|
|
- Progressive disclosure of advanced features
|
|
- Mobile-first responsive design
|
|
|
|
## Success Metrics
|
|
- Processing speed: < 30% of audio duration
|
|
- Detection accuracy: > 95%
|
|
- User task completion: > 90%
|
|
- Time to first success: < 3 minutes
|
|
- System uptime: 99.9%
|
|
- Page load time: < 2 seconds
|
|
- User satisfaction: > 4.5/5 stars
|
|
|
|
## Constraints & Limitations
|
|
- Initial release English-only
|
|
- Maximum file size: 500MB
|
|
- Requires FFmpeg installation
|
|
- Internet connection needed for first model download
|
|
- 4GB RAM minimum, 8GB recommended
|
|
|
|
## Security & Privacy
|
|
- All processing done locally (no cloud uploads)
|
|
- Optional incognito mode
|
|
- One-click data clearing
|
|
- No user tracking without consent
|
|
- Secure API with rate limiting
|
|
|
|
## Testing Requirements
|
|
- Unit tests for all components (pytest)
|
|
- Integration tests for API endpoints
|
|
- End-to-end tests with Playwright
|
|
- Visual regression testing
|
|
- Accessibility testing (WCAG AA)
|
|
- Performance benchmarking
|
|
- Cross-browser testing
|
|
|
|
## Documentation Needs
|
|
- User guide with screenshots
|
|
- API documentation with examples
|
|
- CLI command reference
|
|
- Word list management guide
|
|
- Deployment instructions
|
|
- Contributing guidelines
|
|
|
|
## Future Enhancements (Post-MVP)
|
|
- Multi-language support
|
|
- Cloud processing option
|
|
- Mobile native apps
|
|
- Real-time streaming audio processing
|
|
- AI-powered context understanding
|
|
- Collaborative word list editing
|
|
- Advanced analytics dashboard
|
|
- Plugin system for custom processors
|
|
|
|
## Project Timeline
|
|
- Week 1: Foundation and setup
|
|
- Week 2: Core engine development
|
|
- Week 3: Word list management
|
|
- Week 4: Web UI implementation
|
|
- Week 5: Advanced features
|
|
- Week 6: CLI and API
|
|
- Week 7: Testing and deployment
|
|
|
|
## Risks & Mitigation
|
|
- Whisper accuracy: Provide multiple model options
|
|
- Processing speed: Implement GPU support and caching
|
|
- Large file handling: Use chunking and streaming
|
|
- Browser compatibility: Progressive enhancement approach
|
|
- User adoption: Focus on intuitive UX and onboarding |