Initial commit - Mixcloud RSS Generator

This commit is contained in:
S 2025-08-14 00:43:59 -04:00
commit d7d82c4211
26 changed files with 1633 additions and 0 deletions

66
CHANGELOG.md Normal file
View File

@ -0,0 +1,66 @@
# Changelog - Mixcloud RSS Generator
All notable changes to the Mixcloud RSS Generator component will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### To Do
- Add support for playlists and categories
- Implement feed pagination for users with many shows
- Add configurable cache expiration
- Support for custom feed metadata
## [0.3.0] - 2025-08-04
### Added
- Generated specialized RSS feeds for Revolutionary African Perspectives (RAP) show
- Created filtered feeds for specific date ranges (July 21 episode)
- Added precise show filtering capabilities
- Support for WRFG radio show feeds
### Changed
- Enhanced feed generation scripts for better show filtering
- Improved caching mechanism for faster feed updates
## [0.2.0] - 2025-07-15
### Added
- Web interface for RSS feed generation
- RESTful API endpoints for programmatic access
- Built-in caching system for improved performance
- Docker support with dedicated Dockerfile
- HTML template for web interface
### Changed
- Restructured code into modular components (src directory)
- Improved error handling for invalid Mixcloud URLs
- Enhanced feed metadata with proper iTunes tags
### Fixed
- Audio URL extraction for newer Mixcloud API changes
- Character encoding issues in show descriptions
## [0.1.0] - 2025-06-01
### Added
- Initial release with core functionality
- Command-line interface for RSS feed generation
- Support for converting Mixcloud user shows to RSS
- Basic feed generation with episode metadata
- Compatible with major podcast apps
- Configurable episode limits
### Technical Details
- Python-based implementation
- Uses Mixcloud's public API
- Generates standard RSS 2.0 feeds with podcast extensions
---
## Integration with Personal AI Assistant
This component is used by the main Personal AI Assistant project for:
- Monitoring podcast RSS feeds for new episodes
- Providing audio sources for the transcription pipeline
- Enabling podcast app compatibility for processed shows
For main project changes, see the [parent changelog](../CHANGELOG.md).

228
CLAUDE.md Normal file
View File

@ -0,0 +1,228 @@
# CLAUDE.md - Mixcloud RSS Generator
This file provides guidance to Claude Code when working with the Mixcloud RSS Generator component of the Personal AI Assistant project.
## Relationship to Main Project
- Part of the Personal AI Assistant ecosystem
- See main [CLAUDE.md](../CLAUDE.md) for general project guidelines
- Update [CHANGELOG.md](./CHANGELOG.md) when making changes to this component
## Understanding Component History
**New to this component?** Review [CHANGELOG.md](./CHANGELOG.md) to understand:
- Evolution from v0.1.0 CLI tool to v0.3.0 with specialized feeds
- API changes and adaptations
- Integration timeline with main project
## Project Overview
A backend-only CLI tool that converts Mixcloud user shows into RSS feeds compatible with podcast apps and feed readers. Uses shared content syndication services from the main AI Assistant project for reusability and consistency.
**Architecture Change (v1.0)**: Refactored from Flask web app to backend-only CLI using shared services.
## Technology Stack
- **Language**: Python 3.8+
- **Architecture**: Backend-only CLI with shared services
- **Key Libraries**:
- `requests` - HTTP requests to Mixcloud (via shared services)
- `xml.etree.ElementTree` - RSS/XML generation (via shared services)
- **Shared Services**: Content syndication components from `shared/services/content_syndication/`
- **Removed**: Flask, BeautifulSoup4 (moved to shared services)
- **Caching**: File-based caching in `./cache` directory
## Quick Start
### Backend CLI Usage
```bash
# REQUIRED FIRST STEP - Activate virtual environment (when using local development)
source venv/bin/activate
# Set PYTHONPATH for shared services
export PYTHONPATH=/path/to/my-ai-projects:$PYTHONPATH
# Generate RSS feed for a Mixcloud user
python src/cli.py WRFG
# Save to file
python src/cli.py WRFG -o feed.xml
# Limit number of episodes
python src/cli.py WRFG -l 50
# Advanced filtering
python src/cli.py WRFG --keywords "rap,public affairs" --limit 100
python src/cli.py WRFG --rap-only --limit 200
```
### Legacy Command Line (Deprecated)
```bash
# Still available for compatibility
python src/mixcloud_rss.py username -o feed.xml
```
## Architecture Notes
### New Backend-Only Architecture
1. **CLI Interface**: `src/cli.py` - New primary interface with advanced filtering
2. **Shared Services**: Uses `shared/services/content_syndication/` components:
- `ContentSyndicationService` - Main orchestration
- `MixcloudAPIClient` - API interactions with caching
- `RSSFeedGenerator` - RSS 2.0 generation with iTunes extensions
- `FeedFilterService` - Advanced filtering (dates, keywords, tags)
### RSS Feed Generation Flow
1. CLI parses arguments and builds filters
2. ContentSyndicationService orchestrates the process
3. MixcloudAPIClient fetches user data with caching
4. FeedFilterService applies filtering criteria
5. RSSFeedGenerator creates RSS 2.0 compliant XML
6. Results output to file or stdout
### Key Files
- `src/cli.py` - **NEW** Backend CLI interface (primary)
- `src/mixcloud_rss.py` - Legacy RSS generation logic (deprecated)
- `generate_*.py` - Specialized feed generators (work with legacy code)
- `cache/` - Cached API responses (gitignored)
- **Archived**: `archived_projects/mixcloud-ui/` - Former Flask web interface
### Caching Strategy
- Default TTL: 3600 seconds (1 hour)
- Cache key: MD5 hash of request parameters
- Stored as JSON files in `./cache` directory
## Development Commands
### Testing Feed Generation
```bash
# REQUIRED FIRST STEP - Activate virtual environment (when using local development)
source venv/bin/activate
export PYTHONPATH=/path/to/my-ai-projects:$PYTHONPATH
# Test with new CLI
python src/cli.py WRFG --validate # Quick user validation
python src/cli.py WRFG --user-info # User information
python src/cli.py WRFG --verbose # Verbose RSS generation
# Test filtering
python src/cli.py WRFG --rap-only --limit 100
python src/cli.py WRFG --keywords "interview" --output test.xml
# Validate RSS output
python -m xml.dom.minidom test.xml # Pretty print and validate XML
# Legacy testing (still works)
python src/mixcloud_rss.py WRFG
python generate_rap_feed.py
```
### Cache Management
```bash
# Clear cache
rm -rf cache/*.json
# View cached data
ls -la cache/
```
## Common Issues and Solutions
### Mixcloud API Changes
**Problem**: Feed generation fails with extraction errors
**Solution**:
- Check if Mixcloud HTML structure changed
- Update BeautifulSoup selectors in `extract_shows_from_html()`
- Look for new API endpoints in browser network tab
### Audio URL Extraction
**Problem**: "Could not extract audio URL" errors
**Solution**:
- Mixcloud often changes their audio URL format
- Check `extract_audio_url()` method
- May need to update regex patterns or API calls
### RSS Feed Validation
**Problem**: Podcast apps reject the feed
**Solution**:
- Ensure all required RSS elements are present
- Check iTunes podcast extensions
- Validate dates are in RFC822 format
- Use online RSS validators
### Character Encoding
**Problem**: Special characters appear garbled
**Solution**:
- Ensure UTF-8 encoding throughout
- Use `.encode('utf-8')` when writing files
- Set XML encoding declaration
## Integration with Main Project
### Usage in Personal AI Assistant
1. **Podcast Monitoring**: RSS feeds enable the podcast processing pipeline
2. **Episode Detection**: New episodes detected via RSS polling
3. **Audio Source**: Provides URLs for audio downloading
### Specialized Feeds
The `generate_*.py` scripts create filtered feeds for specific shows:
- `generate_rap_feed.py` - Revolutionary African Perspectives shows
- `generate_july21_feed.py` - Specific date filtering
## Testing
### Manual Testing
```bash
# REQUIRED FIRST STEP - Activate virtual environment (when using local development)
source venv/bin/activate
# Test basic functionality
python src/mixcloud_rss.py WRFG -o test.xml
cat test.xml | head -20 # Check output
# Test web interface
python src/web_app.py
# Visit http://localhost:5000 and test form submission
```
### Feed Validation
- Use https://validator.w3.org/feed/ for RSS validation
- Test in actual podcast apps (Apple Podcasts, Overcast)
- Check all links are accessible
## Deployment Considerations
### Docker Support
```bash
# Build image
docker build -t mixcloud-rss .
# Run container
docker run -p 5000:5000 mixcloud-rss
```
### Environment Variables
- No required environment variables
- Optional: `CACHE_DIR`, `CACHE_TTL` for customization
## Troubleshooting
### Debug Mode
```python
# Add to scripts for verbose output
import logging
logging.basicConfig(level=logging.DEBUG)
```
### Common Error Messages
- **"No shows found"**: Check if username is correct or if Mixcloud is accessible
- **"Cache directory not writable"**: Ensure `./cache` exists and has write permissions
- **"Invalid XML"**: Check for unescaped special characters in show data
### Performance
- Cache is crucial for performance
- Consider implementing Redis cache for production
- Batch requests when generating multiple feeds
## Best Practices
1. Always validate generated RSS feeds
2. Test with multiple podcast apps
3. Monitor Mixcloud for API/structure changes
4. Keep cache directory clean in development
5. Log errors for debugging production issues

23
Dockerfile Normal file
View File

@ -0,0 +1,23 @@
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
# Create cache directory
RUN mkdir -p /app/cache
# Expose port
EXPOSE 5000
# Set environment variables
ENV FLASK_APP=src/web_app.py
ENV PYTHONUNBUFFERED=1
# Run the web server
CMD ["python", "src/web_app.py"]

214
README.md Normal file
View File

@ -0,0 +1,214 @@
# Mixcloud RSS Generator (Backend CLI)
Convert Mixcloud shows into RSS feeds using a lightweight command-line interface that leverages shared content syndication services.
## Features
- 🎵 Convert any Mixcloud user's shows into an RSS feed
- 📱 Compatible with podcast apps (Apple Podcasts, Overcast, etc.)
- 🚀 Fast with built-in caching
- 🔧 Backend-only CLI tool (no web interface)
- 📡 Advanced filtering options (keywords, dates, tags)
- ♻️ Uses shared services for reusability across projects
## Architecture
This project has been refactored to use shared services:
- **Backend Services**: Located in `shared/services/content_syndication/`
- **CLI Interface**: `src/cli.py` provides command-line access
- **Legacy Components**: Web UI archived in `archived_projects/mixcloud-ui/`
## Installation
```bash
# Navigate to the mixcloud-rss-generator directory
cd mixcloud-rss-generator
# Install dependencies
pip install -r requirements.txt
# Ensure shared services are accessible
export PYTHONPATH=/path/to/my-ai-projects:$PYTHONPATH
```
## Usage
### Basic Usage
Generate RSS feed from Mixcloud user:
```bash
# Basic RSS generation
python src/cli.py WRFG
# From full Mixcloud URL
python src/cli.py --url https://www.mixcloud.com/NTSRadio/
# Save to file with custom limit
python src/cli.py WRFG --limit 50 --output feed.xml
```
### Advanced Filtering
```bash
# Filter by keywords in title
python src/cli.py WRFG --keywords "rap,public affairs" --limit 100
# Filter by date range
python src/cli.py WRFG --date-range 2024-01-01 2024-12-31
# Filter by specific dates
python src/cli.py WRFG --specific-dates "July 21,Aug 15,2024-09-01"
# Revolutionary African Perspectives only (convenience filter)
python src/cli.py WRFG --rap-only --limit 100
# Combine multiple filters
python src/cli.py WRFG --keywords "interview" --tags "house,techno" --limit 30
```
### Utility Operations
```bash
# Validate user without generating feed
python src/cli.py WRFG --validate
# Get user information
python src/cli.py WRFG --user-info
# Verbose output for debugging
python src/cli.py WRFG --verbose
```
### Integration with Podcast Apps
1. Generate RSS feed and save to a publicly accessible location
2. Use the file path or served URL in your podcast app:
- **Apple Podcasts**: File → Add Show by URL
- **Overcast**: Add URL → Plus button → Add URL
- **Pocket Casts**: Search → Enter URL
- **Castro**: Library → Sources → Plus → Add Podcast by URL
## Shared Services Architecture
The RSS generation now uses modular services from `shared/services/content_syndication/`:
- **ContentSyndicationService**: Main orchestration service
- **MixcloudAPIClient**: Handles Mixcloud API interactions with caching
- **RSSFeedGenerator**: Creates RSS 2.0 compliant feeds
- **FeedFilterService**: Advanced content filtering capabilities
## Configuration
### Cache Settings
```bash
# Custom cache directory and TTL
python src/cli.py WRFG --cache-dir ./custom-cache --cache-ttl 7200
```
### Environment Variables
- `PYTHONPATH`: Must include parent project directory for shared imports
- No other environment variables required for basic operation
## Examples
### Generate Filtered Feed
```python
# In Python script
from shared.services.content_syndication import ContentSyndicationService
# Initialize service
service = ContentSyndicationService(cache_dir="./cache", cache_ttl=3600)
# Generate with filters
filters = {"keywords": "rap,public affairs", "start_date": "2024-01-01"}
rss_feed = service.generate_mixcloud_rss("WRFG", limit=50, filters=filters)
```
### Integration with Main AI Project
```python
# Use in podcast processing pipeline
from shared.services.content_syndication import ContentSyndicationService
service = ContentSyndicationService()
rss_feed = service.generate_rap_feed("WRFG", limit=100) # Convenience method
# Save for podcast processing
with open("data/feeds/wrfg_rap.xml", "w") as f:
f.write(rss_feed)
```
## Migration Notes
### From Web Interface
The web interface (`web_app.py` and `templates/`) has been archived to `archived_projects/mixcloud-ui/`. Key changes:
- **Before**: `python src/web_app.py` (Flask web server)
- **After**: `python src/cli.py [options]` (CLI tool)
### From Legacy Script
The original `mixcloud_rss.py` remains for compatibility but new usage should prefer the CLI:
- **Legacy**: `python src/mixcloud_rss.py username`
- **New**: `python src/cli.py username`
## Troubleshooting
### Import Errors
```bash
# Ensure PYTHONPATH includes project root
export PYTHONPATH=/path/to/my-ai-projects:$PYTHONPATH
python src/cli.py WRFG --validate
```
### Cache Issues
```bash
# Clear cache if experiencing stale data
rm -rf cache/*.json
python src/cli.py WRFG --verbose
```
### API Errors
- **User not found**: Check username spelling and profile visibility
- **No shows**: User might have private shows or no content
- **Rate limiting**: Wait between requests or increase cache TTL
## Advanced Usage
### Specialized Feeds
The CLI includes convenience options for specialized content:
```bash
# Revolutionary African Perspectives shows only
python src/cli.py WRFG --rap-only --limit 200
# Recent interviews only
python src/cli.py WRFG --keywords "interview" --date-range 2024-01-01 $(date +%Y-%m-%d)
```
### Batch Processing
```bash
# Process multiple users
for user in WRFG NTSRadio ResidentAdvisor; do
python src/cli.py $user --output "feeds/${user}.xml" --limit 50
done
```
## Integration with AI Assistant
This RSS generator integrates with the Personal AI Assistant project for:
- **Podcast Processing**: RSS feeds enable episode detection
- **Audio Analysis**: Provides metadata for audio processing
- **Content Monitoring**: Automated feed checking for new episodes
## License
MIT License - part of the Personal AI Assistant ecosystem.

54
WRFG_filtered_July21.xml Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

14
WRFG_rap_only.xml Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"key": "/WRFG/", "url": "https://www.mixcloud.com/WRFG/", "name": "WRFG Atlanta", "username": "WRFG", "pictures": {"small": "https://thumbnailer.mixcloud.com/unsafe/25x25/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "thumbnail": "https://thumbnailer.mixcloud.com/unsafe/50x50/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "medium_mobile": "https://thumbnailer.mixcloud.com/unsafe/80x80/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "medium": "https://thumbnailer.mixcloud.com/unsafe/100x100/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "large": "https://thumbnailer.mixcloud.com/unsafe/300x300/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "320wx320h": "https://thumbnailer.mixcloud.com/unsafe/320x320/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "extra_large": "https://thumbnailer.mixcloud.com/unsafe/600x600/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "640wx640h": "https://thumbnailer.mixcloud.com/unsafe/640x640/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d"}, "biog": "Founded in 1973 in Atlanta, GA, Radio Free Georgia is a non-profit, non-commercial, independent, community radio station. It broadcasts on 89.3 FM and is licensed at 100,000 watts. \n\nWRFG is committed to bringing progressive news and handpicked independent music to the metro Atlanta area via FM and the world via our internet stream. \n\nLearn more: https://wrfg.org/\nInstagram: https://www.instagram.com/wrfgatlanta/\nFacebook: https://www.facebook.com/wrfgatl89.3fm", "created_time": "2019-05-11T19:08:22Z", "updated_time": "2019-05-11T19:08:22Z", "follower_count": 673, "following_count": 27, "cloudcast_count": 17838, "favorite_count": 7, "listen_count": 0, "is_pro": true, "is_premium": false, "city": "Atlanta", "country": "United States", "cover_pictures": {"835wx120h": "https://thumbnailer.mixcloud.com/unsafe/835x120/profile_cover/0/1/5/2/323b-581d-4ca3-be95-e5ddb0e22789", "1113wx160h": "https://thumbnailer.mixcloud.com/unsafe/1113x160/profile_cover/0/1/5/2/323b-581d-4ca3-be95-e5ddb0e22789", "1670wx240h": "https://thumbnailer.mixcloud.com/unsafe/1670x240/profile_cover/0/1/5/2/323b-581d-4ca3-be95-e5ddb0e22789"}, "picture_primary_color": "000000"}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"key": "/WRFG/", "url": "https://www.mixcloud.com/WRFG/", "name": "WRFG Atlanta", "username": "WRFG", "pictures": {"small": "https://thumbnailer.mixcloud.com/unsafe/25x25/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "thumbnail": "https://thumbnailer.mixcloud.com/unsafe/50x50/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "medium_mobile": "https://thumbnailer.mixcloud.com/unsafe/80x80/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "medium": "https://thumbnailer.mixcloud.com/unsafe/100x100/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "large": "https://thumbnailer.mixcloud.com/unsafe/300x300/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "320wx320h": "https://thumbnailer.mixcloud.com/unsafe/320x320/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "extra_large": "https://thumbnailer.mixcloud.com/unsafe/600x600/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d", "640wx640h": "https://thumbnailer.mixcloud.com/unsafe/640x640/profile/f/4/3/2/f015-1494-4464-8f0c-9c5efa4ef91d"}, "biog": "Founded in 1973 in Atlanta, GA, Radio Free Georgia is a non-profit, non-commercial, independent, community radio station. It broadcasts on 89.3 FM and is licensed at 100,000 watts. \n\nWRFG is committed to bringing progressive news and handpicked independent music to the metro Atlanta area via FM and the world via our internet stream. \n\nLearn more: https://wrfg.org/\nInstagram: https://www.instagram.com/wrfgatlanta/\nFacebook: https://www.facebook.com/wrfgatl89.3fm", "created_time": "2019-05-11T19:08:22Z", "updated_time": "2019-05-11T19:08:22Z", "follower_count": 673, "following_count": 27, "cloudcast_count": 17838, "favorite_count": 7, "listen_count": 0, "is_pro": true, "is_premium": false, "city": "Atlanta", "country": "United States", "cover_pictures": {"835wx120h": "https://thumbnailer.mixcloud.com/unsafe/835x120/profile_cover/0/1/5/2/323b-581d-4ca3-be95-e5ddb0e22789", "1113wx160h": "https://thumbnailer.mixcloud.com/unsafe/1113x160/profile_cover/0/1/5/2/323b-581d-4ca3-be95-e5ddb0e22789", "1670wx240h": "https://thumbnailer.mixcloud.com/unsafe/1670x240/profile_cover/0/1/5/2/323b-581d-4ca3-be95-e5ddb0e22789"}, "picture_primary_color": "000000"}

19
docker-compose.yml Normal file
View File

@ -0,0 +1,19 @@
version: '3.8'
services:
mixcloud-rss:
build: .
container_name: mixcloud-rss-generator
ports:
- "5000:5000"
volumes:
- ./cache:/app/cache
environment:
- SECRET_KEY=${SECRET_KEY:-your-secret-key-here}
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s

63
generate_july21_feed.py Normal file
View File

@ -0,0 +1,63 @@
#!/usr/bin/env python3
"""
Generate RSS feed for specific dates (e.g., July 21 show)
"""
from src.mixcloud_rss import MixcloudRSSGenerator
def generate_filtered_feed(username, specific_dates):
"""Generate RSS feed filtered by specific dates."""
# Create generator
generator = MixcloudRSSGenerator()
# Set up filters
filters = {
'specific_dates': specific_dates
}
# Generate feed
print(f"Generating RSS feed for {username} filtered by dates: {specific_dates}")
rss_feed = generator.generate_rss_from_username(username, limit=50, filters=filters)
if rss_feed:
# Save to file
filename = f"{username}_filtered_{specific_dates.replace(',', '_').replace(' ', '')}.xml"
with open(filename, 'w', encoding='utf-8') as f:
f.write(rss_feed)
print(f"✅ RSS feed saved to: {filename}")
# Also print the RSS URL for the web server
print(f"\n📡 RSS URL for web server:")
print(f"http://localhost:5000/rss/{username}?limit=50&specific_dates={specific_dates}")
# Count episodes
import xml.etree.ElementTree as ET
root = ET.fromstring(rss_feed)
items = root.findall('.//item')
print(f"\n📊 Found {len(items)} episodes matching the filter")
# Show episode details
if items:
print("\n📅 Matching episodes:")
for item in items:
title = item.find('title').text
pub_date = item.find('pubDate').text
print(f" - {title} ({pub_date})")
else:
print("❌ Error: Could not generate RSS feed")
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Usage: python generate_july21_feed.py <username> [dates]")
print("Example: python generate_july21_feed.py djusername 'July 21'")
print("Example: python generate_july21_feed.py djusername 'July 21, August 15'")
sys.exit(1)
username = sys.argv[1]
dates = sys.argv[2] if len(sys.argv) > 2 else "July 21"
generate_filtered_feed(username, dates)

74
generate_rap_feed.py Normal file
View File

@ -0,0 +1,74 @@
#!/usr/bin/env python3
"""
Generate RSS feed for Public Affairs RAP (Revolutionary African Perspectives) shows
"""
from src.mixcloud_rss import MixcloudRSSGenerator
import xml.etree.ElementTree as ET
from datetime import datetime
def generate_rap_feed(username="WRFG"):
"""Generate RSS feed filtered for RAP shows."""
# Create generator
generator = MixcloudRSSGenerator()
# Set up filters for "Public Affairs" in the title
# This should catch variations like "afrikan" vs "african"
filters = {
'keywords': 'public affairs'
}
# Generate feed with a higher limit to catch all shows
print(f"Generating RSS feed for {username} filtered by 'Public Affairs' shows...")
rss_feed = generator.generate_rss_from_username(username, limit=100, filters=filters)
if rss_feed:
# Save to file
filename = f"{username}_public_affairs_rap.xml"
with open(filename, 'w', encoding='utf-8') as f:
f.write(rss_feed)
print(f"✅ RSS feed saved to: {filename}")
# Also print the RSS URL for the web server
print(f"\n📡 RSS URL for web server:")
print(f"http://localhost:5000/rss/{username}?limit=100&keywords=public%20affairs")
# Parse and show episodes
root = ET.fromstring(rss_feed)
items = root.findall('.//item')
print(f"\n📊 Found {len(items)} 'Public Affairs' episodes")
# Show episode details
if items:
print("\n📅 Public Affairs RAP episodes:")
for item in items:
title = item.find('title').text
pub_date_str = item.find('pubDate').text
link = item.find('link').text
# Parse date for better display
try:
pub_date = datetime.strptime(pub_date_str, "%a, %d %b %Y %H:%M:%S %z")
date_display = pub_date.strftime("%B %d, %Y")
except:
date_display = pub_date_str
print(f"\n 📻 {title}")
print(f" Date: {date_display}")
print(f" URL: {link}")
# Check if it's the July 21 show
if "21 july" in title.lower() or "july 21" in title.lower():
print(f" ⭐ This is the July 21 show!")
# Generate specific URL for your podcast system
print(f"\n🎯 For your podcast processing system, use this RSS URL:")
print(f"http://localhost:5000/rss/WRFG?limit=100&keywords=public%20affairs")
else:
print("❌ Error: Could not generate RSS feed")
if __name__ == "__main__":
generate_rap_feed()

85
generate_rap_only_feed.py Normal file
View File

@ -0,0 +1,85 @@
#!/usr/bin/env python3
"""
Generate RSS feed for ONLY the RAP (Revolutionary African/Afrikan Perspectives) shows
"""
from src.mixcloud_rss import MixcloudRSSGenerator
import xml.etree.ElementTree as ET
from datetime import datetime
def generate_rap_only_feed(username="WRFG"):
"""Generate RSS feed filtered for ONLY RAP shows."""
# Create generator
generator = MixcloudRSSGenerator()
# Set up filters for "RAP" in the title
# This will catch both "African" and "Afrikan" variations
filters = {
'keywords': 'RAP' # This will match "RAP - Revolutionary African/Afrikan Perspectives"
}
# Generate feed with a higher limit to catch all shows
print(f"Generating RSS feed for {username} filtered by RAP shows only...")
rss_feed = generator.generate_rss_from_username(username, limit=200, filters=filters)
if rss_feed:
# Save to file
filename = f"{username}_rap_only.xml"
with open(filename, 'w', encoding='utf-8') as f:
f.write(rss_feed)
print(f"✅ RSS feed saved to: {filename}")
# Also print the RSS URL for the web server
print(f"\n📡 RSS URL for web server:")
print(f"http://localhost:5000/rss/{username}?limit=200&keywords=RAP")
# Parse and show episodes
root = ET.fromstring(rss_feed)
items = root.findall('.//item')
print(f"\n📊 Found {len(items)} RAP episodes")
# Show episode details
if items:
print("\n📅 Revolutionary African/Afrikan Perspectives episodes:")
for item in items:
title = item.find('title').text
pub_date_str = item.find('pubDate').text
link = item.find('link').text
description = item.find('description').text if item.find('description') is not None else ""
# Parse date for better display
try:
pub_date = datetime.strptime(pub_date_str, "%a, %d %b %Y %H:%M:%S %z")
date_display = pub_date.strftime("%B %d, %Y")
except:
date_display = pub_date_str
print(f"\n 📻 {title}")
print(f" Date: {date_display}")
print(f" URL: {link}")
# Check if it's the July 21 show
if "21 july" in title.lower() or "july 21" in title.lower():
print(f" ⭐ This is the July 21 show!")
# Check for African vs Afrikan spelling
if "afrikan" in title.lower():
print(f" 📝 Note: Uses 'Afrikan' spelling")
elif "african" in title.lower():
print(f" 📝 Note: Uses 'African' spelling")
# Generate specific URL for your podcast system
print(f"\n🎯 For your podcast processing system, use this RSS URL:")
print(f"http://localhost:5000/rss/WRFG?limit=200&keywords=RAP")
# Also create a direct link to the July 21 episode
print(f"\n🔗 Direct link to July 21 RAP show:")
print(f"https://www.mixcloud.com/WRFG/public-affairs-rap-revolutionary-african-perspectives-21-july-2025/")
else:
print("❌ Error: Could not generate RSS feed")
if __name__ == "__main__":
generate_rap_only_feed()

View File

@ -0,0 +1,90 @@
#!/usr/bin/env python3
"""
Generate RSS feed for ONLY the Revolutionary African/Afrikan Perspectives shows
Using multiple keywords to be more precise
"""
from src.mixcloud_rss import MixcloudRSSGenerator
import xml.etree.ElementTree as ET
from datetime import datetime
def generate_rap_precise_feed(username="WRFG"):
"""Generate RSS feed filtered for ONLY Revolutionary African/Afrikan Perspectives shows."""
# Create generator
generator = MixcloudRSSGenerator()
# Set up filters - use "revolutionary" as it's unique to these shows
filters = {
'keywords': 'revolutionary' # This should only match the RAP shows
}
# Generate feed
print(f"Generating RSS feed for {username} filtered by Revolutionary African/Afrikan Perspectives shows...")
rss_feed = generator.generate_rss_from_username(username, limit=200, filters=filters)
if rss_feed:
# Save to file
filename = f"{username}_revolutionary_african_perspectives.xml"
with open(filename, 'w', encoding='utf-8') as f:
f.write(rss_feed)
print(f"✅ RSS feed saved to: {filename}")
# RSS URLs for different filtering options
print(f"\n📡 RSS URLs for web server:")
print(f"Option 1 (by 'revolutionary'): http://localhost:5000/rss/{username}?limit=200&keywords=revolutionary")
print(f"Option 2 (by 'public affairs' + 'revolutionary'): http://localhost:5000/rss/{username}?limit=200&keywords=public%20affairs,revolutionary")
# Parse and show episodes
root = ET.fromstring(rss_feed)
items = root.findall('.//item')
print(f"\n📊 Found {len(items)} Revolutionary African/Afrikan Perspectives episodes")
# Show episode details
if items:
print("\n📅 All Revolutionary African/Afrikan Perspectives episodes:")
july_21_found = False
for item in items:
title = item.find('title').text
pub_date_str = item.find('pubDate').text
link = item.find('link').text
# Parse date for better display
try:
pub_date = datetime.strptime(pub_date_str, "%a, %d %b %Y %H:%M:%S %z")
date_display = pub_date.strftime("%B %d, %Y")
show_date = pub_date.strftime("%Y-%m-%d")
except:
date_display = pub_date_str
show_date = ""
print(f"\n 📻 {title}")
print(f" Date: {date_display}")
print(f" URL: {link}")
# Check if it's the July 21 show
if "21 july" in title.lower() or "july 21" in title.lower() or "2025-07-21" in show_date:
print(f" ⭐ This is the July 21, 2025 show!")
july_21_found = True
july_21_url = link
if july_21_found:
print(f"\n✨ JULY 21 SHOW FOUND!")
print(f"Direct URL: {july_21_url}")
print(f"\nTo analyze this specific show in your podcast system:")
print(f"1. Use the RSS feed URL above")
print(f"2. Or process this specific episode URL directly")
# Summary
print(f"\n📈 Summary:")
print(f"- Total RAP episodes found: {len(items)}")
print(f"- These are weekly shows featuring Revolutionary African/Afrikan Perspectives")
print(f"- The feed includes both 'African' and 'Afrikan' spelling variations")
else:
print("❌ Error: Could not generate RSS feed")
if __name__ == "__main__":
generate_rap_precise_feed()

68
mixcloud-rss.log Normal file
View File

@ -0,0 +1,68 @@
* Serving Flask app 'web_app'
* Debug mode: on
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://192.168.68.59:5000
INFO:werkzeug:Press CTRL+C to quit
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 22:53:22] "GET /health HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 22:55:20] "GET / HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 22:55:21] "GET /favicon.ico HTTP/1.1" 404 -
INFO:werkzeug: * Detected change in '/var/home/enias/Claude/MyProject/personal-ai-assistant/mixcloud-rss-generator/src/mixcloud_rss.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/Claude/MyProject/personal-ai-assistant/mixcloud-rss-generator/src/mixcloud_rss.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/Claude/MyProject/personal-ai-assistant/mixcloud-rss-generator/src/web_app.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/Claude/MyProject/personal-ai-assistant/mixcloud-rss-generator/src/web_app.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:00:37] "GET / HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:00:51] "POST /generate HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:01:03] "POST /api/validate HTTP/1.1" 200 -
ERROR:__main__:Error generating RSS: can't compare offset-naive and offset-aware datetimes
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:01:21] "POST /generate HTTP/1.1" 500 -
ERROR:__main__:Error generating RSS: can't compare offset-naive and offset-aware datetimes
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:01:30] "POST /generate HTTP/1.1" 500 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:01:40] "GET / HTTP/1.1" 200 -
ERROR:__main__:Error generating RSS: can't compare offset-naive and offset-aware datetimes
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:01:43] "POST /generate HTTP/1.1" 500 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:01:53] "POST /generate HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:07:42] "GET /rss/WRFG?limit=200&keywords=revolutionary HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:15:57] "GET /rss/WRFG?limit=200&keywords=revolutionary HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:19:38] "GET /rss/WRFG?limit=200&keywords=revolutionary HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [26/Jul/2025 23:29:23] "GET /rss/WRFG?limit=200&keywords=revolutionary HTTP/1.1" 200 -
INFO:werkzeug: * Detected change in '/var/home/enias/.local/lib/python3.10/site-packages/nvidia/__init__.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/.local/lib/python3.10/site-packages/nvidia/__init__.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/.local/lib/python3.10/site-packages/nvidia/__init__.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/.local/lib/python3.10/site-packages/nvidia/__init__.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/.local/lib/python3.10/site-packages/nvidia/__init__.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005
INFO:werkzeug: * Detected change in '/var/home/enias/.local/lib/python3.10/site-packages/nvidia/__init__.py', reloading
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 785-868-005

3
requirements.txt Normal file
View File

@ -0,0 +1,3 @@
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0

Binary file not shown.

189
src/cli.py Executable file
View File

@ -0,0 +1,189 @@
#!/usr/bin/env python3
"""
Backend-only CLI for Mixcloud RSS Generation
Uses shared content syndication services for RSS generation.
Replaces web_app.py and legacy mixcloud_rss.py dependencies.
"""
import argparse
import json
import os
import sys
from typing import Dict, Optional
# Add parent directories to path for shared imports
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../../'))
from shared.services.content_syndication import (
ContentSyndicationService,
FeedFilterService
)
def main():
"""Command-line interface for backend Mixcloud RSS generation."""
parser = argparse.ArgumentParser(
description="Generate RSS feeds from Mixcloud users (Backend CLI)",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s WRFG # Basic RSS for WRFG user
%(prog)s --url https://mixcloud.com/NTSRadio/ # From full URL
%(prog)s WRFG --limit 50 --output feed.xml # Save 50 episodes to file
%(prog)s WRFG --keywords "rap,public affairs" # Filter by keywords
%(prog)s WRFG --rap-only # RAP shows only
%(prog)s WRFG --date-range 2024-01-01 2024-12-31 # Date filtering
"""
)
# Input options
input_group = parser.add_mutually_exclusive_group(required=True)
input_group.add_argument("username", nargs='?', help="Mixcloud username")
input_group.add_argument("--url", help="Mixcloud URL")
# Output options
parser.add_argument("-o", "--output", help="Output file path (default: stdout)")
parser.add_argument("-l", "--limit", type=int, default=20,
help="Number of episodes to include (default: 20)")
# Caching options
parser.add_argument("--cache-dir", default="./cache",
help="Cache directory path (default: ./cache)")
parser.add_argument("--cache-ttl", type=int, default=3600,
help="Cache TTL in seconds (default: 3600)")
# Filtering options
filter_group = parser.add_argument_group("Filtering Options")
filter_group.add_argument("--keywords",
help="Filter by keywords in title (comma-separated)")
filter_group.add_argument("--tags",
help="Filter by tags (comma-separated)")
filter_group.add_argument("--date-range", nargs=2, metavar=('START', 'END'),
help="Filter by date range (YYYY-MM-DD format)")
filter_group.add_argument("--specific-dates",
help="Filter by specific dates (comma-separated)")
# Convenience options
convenience_group = parser.add_argument_group("Convenience Options")
convenience_group.add_argument("--rap-only", action='store_true',
help="Filter for Revolutionary African Perspectives shows only")
# Utility options
parser.add_argument("--validate", action='store_true',
help="Validate user without generating feed")
parser.add_argument("--user-info", action='store_true',
help="Show user information only")
parser.add_argument("--verbose", "-v", action='store_true',
help="Verbose output")
args = parser.parse_args()
# Determine username
username = args.username
if args.url:
# Initialize service to extract username
syndication_service = ContentSyndicationService(args.cache_dir, args.cache_ttl)
try:
rss_feed = syndication_service.generate_mixcloud_rss_from_url(args.url, limit=1)
# Extract username from URL using service
username = syndication_service.mixcloud_client.extract_username_from_url(args.url)
if not username:
print(f"Error: Could not extract username from URL: {args.url}", file=sys.stderr)
return 1
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
return 1
if not username:
print("Error: No username provided", file=sys.stderr)
return 1
# Initialize content syndication service
syndication_service = ContentSyndicationService(args.cache_dir, args.cache_ttl)
# Handle validation only
if args.validate:
result = syndication_service.validate_mixcloud_user(username)
if result['valid']:
print(f"✅ Valid user: {result['username']} ({result['name']}) - {result['show_count']} shows")
return 0
else:
print(f"❌ Invalid user: {result['message']}")
return 1
# Handle user info only
if args.user_info:
user_data = syndication_service.get_mixcloud_user_info(username)
if user_data:
print(f"User: {user_data.get('name', username)}")
print(f"Username: {username}")
print(f"Bio: {user_data.get('biog', 'N/A')}")
print(f"Shows: {user_data.get('cloudcast_count', 0)}")
print(f"Profile: https://www.mixcloud.com/{username}/")
return 0
else:
print(f"Error: User '{username}' not found", file=sys.stderr)
return 1
# Build filters
filters = {}
if args.rap_only:
filters = FeedFilterService.create_rap_filter()
if args.verbose:
print("Applied RAP filter", file=sys.stderr)
if args.keywords:
filters['keywords'] = args.keywords
if args.verbose:
print(f"Filter: keywords = {args.keywords}", file=sys.stderr)
if args.tags:
filters['tags'] = args.tags
if args.verbose:
print(f"Filter: tags = {args.tags}", file=sys.stderr)
if args.date_range:
filters['start_date'] = args.date_range[0]
filters['end_date'] = args.date_range[1]
if args.verbose:
print(f"Filter: date range = {args.date_range[0]} to {args.date_range[1]}", file=sys.stderr)
if args.specific_dates:
filters['specific_dates'] = args.specific_dates
if args.verbose:
print(f"Filter: specific dates = {args.specific_dates}", file=sys.stderr)
# Generate RSS feed
try:
if args.verbose:
print(f"Generating RSS for user: {username}", file=sys.stderr)
print(f"Limit: {args.limit} episodes", file=sys.stderr)
if filters:
print(f"Filters applied: {list(filters.keys())}", file=sys.stderr)
rss_feed = syndication_service.generate_mixcloud_rss(username, args.limit, filters)
if rss_feed:
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(rss_feed)
print(f"RSS feed saved to: {args.output}", file=sys.stderr)
else:
print(rss_feed)
return 0
else:
print(f"Error: Could not generate RSS feed for user '{username}'", file=sys.stderr)
return 1
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
if args.verbose:
import traceback
traceback.print_exc()
return 1
if __name__ == "__main__":
sys.exit(main())

375
src/mixcloud_rss.py Normal file
View File

@ -0,0 +1,375 @@
#!/usr/bin/env python3
"""
Mixcloud to RSS Feed Generator
Converts Mixcloud user pages or show pages into RSS feeds that can be consumed
by podcast apps or feed readers.
"""
import json
import re
import xml.etree.ElementTree as ET
from datetime import datetime
from typing import Dict, List, Optional, Union
from urllib.parse import quote, urlencode, urlparse
import hashlib
import os
import requests
from bs4 import BeautifulSoup
class MixcloudRSSGenerator:
"""Generate RSS feeds from Mixcloud pages."""
def __init__(self, cache_dir: str = "./cache", cache_ttl: int = 3600):
"""
Initialize the Mixcloud RSS Generator.
Args:
cache_dir: Directory for caching API responses
cache_ttl: Cache time-to-live in seconds (default: 1 hour)
"""
self.cache_dir = cache_dir
self.cache_ttl = cache_ttl
self.api_base = "https://api.mixcloud.com"
self.base_url = "https://www.mixcloud.com"
# Create cache directory if it doesn't exist
os.makedirs(cache_dir, exist_ok=True)
def _get_cache_path(self, url: str) -> str:
"""Generate cache file path for a URL."""
url_hash = hashlib.md5(url.encode()).hexdigest()
return os.path.join(self.cache_dir, f"{url_hash}.json")
def _get_cached_data(self, url: str) -> Optional[Dict]:
"""Get cached data if available and not expired."""
cache_path = self._get_cache_path(url)
if os.path.exists(cache_path):
# Check if cache is still valid
cache_age = datetime.now().timestamp() - os.path.getmtime(cache_path)
if cache_age < self.cache_ttl:
with open(cache_path, 'r') as f:
return json.load(f)
return None
def _save_to_cache(self, url: str, data: Dict) -> None:
"""Save data to cache."""
cache_path = self._get_cache_path(url)
with open(cache_path, 'w') as f:
json.dump(data, f)
def _fetch_mixcloud_data(self, api_url: str) -> Optional[Dict]:
"""Fetch data from Mixcloud API with caching."""
# Check cache first
cached_data = self._get_cached_data(api_url)
if cached_data:
return cached_data
try:
response = requests.get(api_url, timeout=10)
response.raise_for_status()
data = response.json()
# Save to cache
self._save_to_cache(api_url, data)
return data
except Exception as e:
print(f"Error fetching Mixcloud data: {e}")
return None
def _extract_username_from_url(self, url: str) -> Optional[str]:
"""Extract username from Mixcloud URL."""
# Handle various Mixcloud URL formats
patterns = [
r'mixcloud\.com/([^/]+)/?$',
r'mixcloud\.com/([^/]+)/(?:uploads|favorites|listens)?/?$',
r'mixcloud\.com/([^/]+)/[^/]+/?$', # Specific show
]
for pattern in patterns:
match = re.search(pattern, url)
if match:
return match.group(1)
return None
def _format_duration(self, seconds: int) -> str:
"""Format duration in seconds to HH:MM:SS."""
hours = seconds // 3600
minutes = (seconds % 3600) // 60
secs = seconds % 60
if hours > 0:
return f"{hours:02d}:{minutes:02d}:{secs:02d}"
else:
return f"{minutes:02d}:{secs:02d}"
def _filter_shows(self, shows: List[Dict], filters: Dict = None) -> List[Dict]:
"""Filter shows based on criteria."""
if not filters:
return shows
filtered_shows = shows
# Filter by date range
if filters.get('start_date'):
start_date = datetime.fromisoformat(filters['start_date'].replace('Z', '+00:00'))
filtered_shows = [
show for show in filtered_shows
if datetime.fromisoformat(show['created_time'].replace('Z', '+00:00')) >= start_date
]
if filters.get('end_date'):
end_date = datetime.fromisoformat(filters['end_date'].replace('Z', '+00:00'))
filtered_shows = [
show for show in filtered_shows
if datetime.fromisoformat(show['created_time'].replace('Z', '+00:00')) <= end_date
]
# Filter by keywords in title
if filters.get('keywords'):
keywords = filters['keywords'].lower().split(',')
filtered_shows = [
show for show in filtered_shows
if any(keyword.strip() in show.get('name', '').lower() for keyword in keywords)
]
# Filter by tags
if filters.get('tags'):
filter_tags = [tag.strip().lower() for tag in filters['tags'].split(',')]
filtered_shows = [
show for show in filtered_shows
if any(
tag['name'].lower() in filter_tags
for tag in show.get('tags', [])
)
]
# Filter by specific dates (e.g., "July 21")
if filters.get('specific_dates'):
dates = filters['specific_dates'].split(',')
filtered_shows = [
show for show in filtered_shows
if self._matches_date(show['created_time'], dates)
]
return filtered_shows
def _matches_date(self, created_time: str, dates: List[str]) -> bool:
"""Check if created_time matches any of the specified dates."""
show_date = datetime.fromisoformat(created_time.replace('Z', '+00:00'))
for date_str in dates:
date_str = date_str.strip().lower()
# Handle various date formats
# "July 21" or "Jul 21"
if any(month in date_str for month in ['january', 'february', 'march', 'april', 'may', 'june',
'july', 'august', 'september', 'october', 'november', 'december',
'jan', 'feb', 'mar', 'apr', 'may', 'jun',
'jul', 'aug', 'sep', 'oct', 'nov', 'dec']):
try:
# Parse month and day
parsed_date = datetime.strptime(f"{date_str} {show_date.year}", "%B %d %Y")
if show_date.date() == parsed_date.date():
return True
except:
try:
parsed_date = datetime.strptime(f"{date_str} {show_date.year}", "%b %d %Y")
if show_date.date() == parsed_date.date():
return True
except:
pass
# "2024-07-21" format
elif '-' in date_str:
try:
parsed_date = datetime.fromisoformat(date_str)
if show_date.date() == parsed_date.date():
return True
except:
pass
# "07/21/2024" or "7/21/2024" format
elif '/' in date_str:
for fmt in ['%m/%d/%Y', '%m/%d/%y', '%d/%m/%Y', '%d/%m/%y']:
try:
parsed_date = datetime.strptime(date_str, fmt)
if show_date.date() == parsed_date.date():
return True
except:
pass
return False
def _build_rss_feed(self, user_data: Dict, shows: List[Dict]) -> str:
"""Build RSS XML feed from user data and shows."""
# Create root RSS element
rss = ET.Element("rss", version="2.0", attrib={
"xmlns:itunes": "http://www.itunes.com/dtds/podcast-1.0.dtd",
"xmlns:content": "http://purl.org/rss/1.0/modules/content/"
})
channel = ET.SubElement(rss, "channel")
# Channel metadata
ET.SubElement(channel, "title").text = user_data.get("name", "Mixcloud Feed")
ET.SubElement(channel, "link").text = f"{self.base_url}{user_data.get('key', '')}"
ET.SubElement(channel, "description").text = user_data.get("biog", "Mixcloud podcast feed")
ET.SubElement(channel, "language").text = "en-us"
ET.SubElement(channel, "lastBuildDate").text = datetime.now().strftime("%a, %d %b %Y %H:%M:%S +0000")
# iTunes podcast metadata
ET.SubElement(channel, "itunes:author").text = user_data.get("name", "")
ET.SubElement(channel, "itunes:summary").text = user_data.get("biog", "")
if user_data.get("pictures", {}).get("large"):
image = ET.SubElement(channel, "itunes:image")
image.set("href", user_data["pictures"]["large"])
# Add each show as an item
for show in shows:
item = ET.SubElement(channel, "item")
# Basic item elements
ET.SubElement(item, "title").text = show.get("name", "")
ET.SubElement(item, "link").text = f"{self.base_url}{show.get('key', '')}"
# Description with tags
description = show.get("description", "")
if show.get("tags"):
tags = ", ".join([tag["name"] for tag in show["tags"]])
description += f"\n\nTags: {tags}"
ET.SubElement(item, "description").text = description
# Publication date
created_time = show.get("created_time")
if created_time:
pub_date = datetime.fromisoformat(created_time.replace("Z", "+00:00"))
ET.SubElement(item, "pubDate").text = pub_date.strftime("%a, %d %b %Y %H:%M:%S +0000")
# GUID
ET.SubElement(item, "guid", isPermaLink="true").text = f"{self.base_url}{show.get('key', '')}"
# Audio enclosure (if audio URL is available)
audio_url = show.get("audio_url") or f"{self.base_url}{show.get('key', '')}"
enclosure = ET.SubElement(item, "enclosure")
enclosure.set("url", audio_url)
enclosure.set("type", "audio/mpeg")
enclosure.set("length", str(show.get("audio_length", 0)))
# iTunes elements
ET.SubElement(item, "itunes:author").text = user_data.get("name", "")
ET.SubElement(item, "itunes:summary").text = description
ET.SubElement(item, "itunes:duration").text = self._format_duration(show.get("audio_length", 0))
if show.get("pictures", {}).get("large"):
ET.SubElement(item, "itunes:image").set("href", show["pictures"]["large"])
# Convert to string
return '<?xml version="1.0" encoding="UTF-8"?>\n' + ET.tostring(rss, encoding="unicode")
def get_user_shows(self, username: str, limit: int = 20) -> Optional[List[Dict]]:
"""Get list of shows for a Mixcloud user."""
# Fetch user data
user_api_url = f"{self.api_base}/{username}/"
user_data = self._fetch_mixcloud_data(user_api_url)
if not user_data:
return None
# Fetch user's cloudcasts (shows)
shows_api_url = f"{self.api_base}/{username}/cloudcasts/"
params = {"limit": limit}
all_shows = []
while len(all_shows) < limit:
shows_data = self._fetch_mixcloud_data(f"{shows_api_url}?{urlencode(params)}")
if not shows_data or "data" not in shows_data:
break
all_shows.extend(shows_data["data"])
# Check for next page
if "paging" in shows_data and "next" in shows_data["paging"]:
shows_api_url = shows_data["paging"]["next"]
else:
break
return {"user": user_data, "shows": all_shows[:limit]}
def generate_rss_from_url(self, mixcloud_url: str, limit: int = 20, filters: Dict = None) -> Optional[str]:
"""Generate RSS feed from a Mixcloud URL with optional filters."""
username = self._extract_username_from_url(mixcloud_url)
if not username:
raise ValueError(f"Could not extract username from URL: {mixcloud_url}")
data = self.get_user_shows(username, limit * 2) # Get more shows to filter from
if not data:
return None
# Apply filters
filtered_shows = self._filter_shows(data["shows"], filters)[:limit]
return self._build_rss_feed(data["user"], filtered_shows)
def generate_rss_from_username(self, username: str, limit: int = 20, filters: Dict = None) -> Optional[str]:
"""Generate RSS feed from a Mixcloud username with optional filters."""
data = self.get_user_shows(username, limit * 2) # Get more shows to filter from
if not data:
return None
# Apply filters
filtered_shows = self._filter_shows(data["shows"], filters)[:limit]
return self._build_rss_feed(data["user"], filtered_shows)
def main():
"""Command-line interface for the Mixcloud RSS generator."""
import argparse
parser = argparse.ArgumentParser(description="Generate RSS feeds from Mixcloud pages")
parser.add_argument("input", help="Mixcloud URL or username")
parser.add_argument("-l", "--limit", type=int, default=20, help="Number of episodes to include (default: 20)")
parser.add_argument("-o", "--output", help="Output file path (default: stdout)")
parser.add_argument("-c", "--cache-dir", default="./cache", help="Cache directory path")
parser.add_argument("-t", "--cache-ttl", type=int, default=3600, help="Cache TTL in seconds (default: 3600)")
args = parser.parse_args()
# Create generator
generator = MixcloudRSSGenerator(cache_dir=args.cache_dir, cache_ttl=args.cache_ttl)
# Determine if input is URL or username
if "mixcloud.com" in args.input:
rss_feed = generator.generate_rss_from_url(args.input, args.limit)
else:
rss_feed = generator.generate_rss_from_username(args.input, args.limit)
if rss_feed:
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(rss_feed)
print(f"RSS feed saved to: {args.output}")
else:
print(rss_feed)
else:
print("Error: Could not generate RSS feed")
return 1
return 0
if __name__ == "__main__":
exit(main())