--- description: UTC timestamp handling patterns for consistent timezone management for src/**/* and other relevant directories alwaysApply: false --- # UTC Timestamps Rule ## Core Principles - **UTC Only**: Always use UTC for all timestamps - **Timezone Awareness**: Make timezone explicit in all datetime objects - **Standard Formats**: Use ISO 8601 for APIs, YYYYMMDD_HHMMSS for filenames - **No Manual Construction**: Generate timestamps with proper functions - **Consistent Patterns**: Use the same timestamp approach across all services ## Implementation Patterns ### UTC Timestamp Generation ```python # ✅ DO: Generate timestamps with UTC timezone from datetime import datetime, timezone # Python - Standard approach def get_current_timestamp() -> datetime: """Get current timestamp with UTC timezone.""" return datetime.now(timezone.utc) # For performance timing, prefer datetime over time.time() def measure_performance(): start_time = datetime.now(timezone.utc) # ... operation ... elapsed = (datetime.now(timezone.utc) - start_time).total_seconds() return elapsed ``` ### Database Timestamps ```python # ✅ DO: Store timestamps in UTC in the database from sqlalchemy import Column, DateTime from sqlalchemy.sql import func class MediaFile(Base): __tablename__ = "media_files" id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4) # Use timezone=True to ensure timezone awareness created_at = Column(DateTime(timezone=True), server_default=func.now()) updated_at = Column(DateTime(timezone=True), onupdate=func.now()) # For manual updates, use UTC def update_timestamp(self): self.updated_at = datetime.now(timezone.utc) ``` ### API Response Formatting ```python # ✅ DO: Use ISO 8601 format for API responses def format_timestamp_for_api(dt: datetime) -> str: """Format datetime as ISO 8601 string.""" return dt.isoformat() # Example API response { "id": "123", "name": "Example", "created_at": "2025-01-15T10:30:45.123456Z", # ISO 8601 format with Z for UTC "completed_at": "2025-01-15T10:35:12.789012Z" } ``` ### Filename Formatting ```python # ✅ DO: Use YYYYMMDD_HHMMSS format for filenames def generate_filename(prefix: str) -> str: """Generate filename with timestamp.""" timestamp = datetime.now(timezone.utc) formatted = timestamp.strftime("%Y%m%d_%H%M%S") return f"{prefix}_{formatted}.wav" # Example: "recording_20250115_103045.wav" # Example: "research_20250115_143022.md" ``` ### Service-Specific Patterns #### Transcription Service ```python # ✅ DO: Use UTC for all transcription timestamps class TranscriptionService: def complete_transcription(self, result): return { "text": result.text, "completed_at": datetime.now(timezone.utc).isoformat(), "timestamp": datetime.now(timezone.utc), "merged_at": datetime.now(timezone.utc).isoformat() } ``` #### Performance Monitoring ```python # ✅ DO: Use datetime for performance metrics class PerformanceMonitor: def record_metric(self, operation: str): return { "operation": operation, "timestamp": datetime.now(timezone.utc).isoformat(), "start_time": datetime.now(timezone.utc) } def measure_elapsed(self, start_time: datetime) -> float: return (datetime.now(timezone.utc) - start_time).total_seconds() ``` #### Research and Export ```python # ✅ DO: Consistent timestamp formatting for exports def export_research_data(data: dict) -> dict: return { **data, "timestamp": datetime.now(timezone.utc).isoformat(), "generated_at": datetime.now(timezone.utc).isoformat() } def generate_export_filename(prefix: str, extension: str) -> str: timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S") return f"{prefix}_{timestamp}.{extension}" ``` ### Anti-Patterns #### ❌ DON'T: Use naive datetime objects ```python # Wrong! Missing timezone timestamp = datetime.now() # Uses local timezone completed_at = datetime.now().isoformat() # Inconsistent timezone ``` #### ❌ DON'T: Use deprecated datetime.utcnow() ```python # Wrong! Deprecated method profile.updated_at = datetime.utcnow() # Use datetime.now(timezone.utc) instead ``` #### ❌ DON'T: Mix time.time() and datetime for timing ```python # Wrong! Inconsistent timing approach start_time = time.time() # ... operation ... elapsed = time.time() - start_time # Better: Use datetime consistently start_time = datetime.now(timezone.utc) # ... operation ... elapsed = (datetime.now(timezone.utc) - start_time).total_seconds() ``` #### ❌ DON'T: Inconsistent filename formats ```python # Wrong! Inconsistent formatting file_name = f"research_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md" # Good file_name = f"data_{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}.json" # Wrong format # Example: "research_20250115_143022.md" (correct) # Example: "data_2025-01-15 14:30:22.json" (incorrect) ``` ## Migration Guidelines ### For Existing Code 1. **Replace `datetime.now()`** with `datetime.now(timezone.utc)` 2. **Replace `datetime.utcnow()`** with `datetime.now(timezone.utc)` 3. **Standardize filename formats** to `YYYYMMDD_HHMMSS` 4. **Use datetime for performance timing** instead of `time.time()` 5. **Ensure all database columns** use `DateTime(timezone=True)` ### Priority Files to Fix Based on analysis, prioritize these files: - `src/services/transcription_service.py` - Multiple naive datetime usages - `src/services/local_transcription_service.py` - Naive datetime - `src/repositories/speaker_profile_repository.py` - Uses deprecated utcnow() - `src/base/batch_processor.py` - Uses deprecated utcnow() ### Periodic Cleanup Process (2025+) #### Quarterly Timestamp Audit Perform a comprehensive audit every 3 months to identify and fix timestamp inconsistencies: ```python # ✅ DO: Create a timestamp audit script import re from pathlib import Path from datetime import datetime, timezone def audit_timestamps(project_root: Path): """Audit project for timestamp inconsistencies.""" issues = [] # Patterns to check for patterns = { 'naive_datetime': r'datetime\.now\(\)', 'deprecated_utcnow': r'datetime\.utcnow\(\)', 'time_dot_time': r'time\.time\(\)', 'inconsistent_filename': r'strftime\([\'"][^Y]*%Y[^m]*%m[^d]*%d[^_]*_[^H]*%H[^M]*%M[^S]*%S[^\'"]*[\'"]\)' } for py_file in project_root.rglob('*.py'): content = py_file.read_text() for pattern_name, pattern in patterns.items(): if re.search(pattern, content): issues.append(f"{py_file}: {pattern_name}") return issues ``` #### Automated Cleanup Scripts Create automated scripts to fix common timestamp issues: ```python # ✅ DO: Automated timestamp cleanup import re from pathlib import Path def fix_naive_datetime(file_path: Path): """Replace naive datetime.now() with UTC-aware version.""" content = file_path.read_text() # Replace datetime.now() with datetime.now(timezone.utc) fixed_content = re.sub( r'datetime\.now\(\)', 'datetime.now(timezone.utc)', content ) # Replace datetime.utcnow() with datetime.now(timezone.utc) fixed_content = re.sub( r'datetime\.utcnow\(\)', 'datetime.now(timezone.utc)', fixed_content ) if fixed_content != content: file_path.write_text(fixed_content) return True return False def standardize_filename_formats(file_path: Path): """Standardize filename timestamp formats to YYYYMMDD_HHMMSS.""" content = file_path.read_text() # Fix inconsistent filename formats patterns = [ (r'strftime\([\'"][^Y]*%Y[^\-]*\-[^m]*%m[^\-]*\-[^d]*%d[^_]*_[^H]*%H[^\:]*\:[^M]*%M[^\:]*\:[^S]*%S[^\'"]*[\'"]\)', 'strftime("%Y%m%d_%H%M%S")'), (r'strftime\([\'"][^Y]*%Y[^\-]*\-[^m]*%m[^\-]*\-[^d]*%d[^\'"]*[\'"]\)', 'strftime("%Y%m%d")') ] for pattern, replacement in patterns: content = re.sub(pattern, replacement, content) if content != file_path.read_text(): file_path.write_text(content) return True return False ``` #### Cleanup Checklist (2025) - [ ] **Q1 2025**: Audit all transcription services for naive datetime usage - [ ] **Q2 2025**: Standardize all filename timestamp formats - [ ] **Q3 2025**: Replace all `time.time()` usage with datetime objects - [ ] **Q4 2025**: Verify all database migrations use `timezone=True` - [ ] **Ongoing**: Fix timestamp issues in new code during code reviews #### Legacy File Detection Identify files with potentially problematic timestamp formats: ```python # ✅ DO: Detect legacy timestamp patterns def detect_legacy_timestamps(project_root: Path): """Detect files with legacy timestamp patterns.""" legacy_files = [] for py_file in project_root.rglob('*.py'): content = py_file.read_text() # Check for patterns that suggest legacy timestamp usage legacy_patterns = [ r'datetime\.now\(\)', # Naive datetime r'datetime\.utcnow\(\)', # Deprecated method r'strftime\([\'"][^Y]*%Y[^m]*%m[^d]*%d[^\'"]*[\'"]\)', # Custom date formats r'time\.time\(\)', # Unix timestamps instead of datetime ] for pattern in legacy_patterns: if re.search(pattern, content): legacy_files.append(str(py_file)) break return legacy_files ``` ## Testing Timestamps ```python # ✅ DO: Test timestamp generation def test_timestamp_generation(): timestamp = get_current_timestamp() assert timestamp.tzinfo == timezone.utc assert timestamp.tzinfo is not None def test_filename_formatting(): filename = generate_filename("test") assert re.match(r"test_\d{8}_\d{6}\.wav", filename) # Example: "test_20250115_143022.wav" ``` Always generate timestamps using `datetime.now(timezone.utc)`. Never use `datetime.now()` or `datetime.utcnow()`. For API responses, use ISO 8601 format. For filenames, use `YYYYMMDD_HHMMSS` format. All database timestamps must be stored in UTC with `timezone=True`. # ❌ DON'T: Store timestamps without timezone info created_at = Column(DateTime, server_default=func.now()) # Wrong! Missing timezone=True ``` Always generate timestamps using datetime functions with UTC timezone (e.g., datetime.now(timezone.utc)). Never hardcode or manually construct timestamps. For API responses, use ISO 8601 format. For filenames, use the format YYYYMMDD_HHMMSS. All database timestamps must be stored in UTC.