trax/.cursor/rules/utc-timestamps.mdc

319 lines
10 KiB
Plaintext

---
description: UTC timestamp handling patterns for consistent timezone management for src/**/* and other relevant directories
alwaysApply: false
---
# UTC Timestamps Rule
## Core Principles
- **UTC Only**: Always use UTC for all timestamps
- **Timezone Awareness**: Make timezone explicit in all datetime objects
- **Standard Formats**: Use ISO 8601 for APIs, YYYYMMDD_HHMMSS for filenames
- **No Manual Construction**: Generate timestamps with proper functions
- **Consistent Patterns**: Use the same timestamp approach across all services
## Implementation Patterns
### UTC Timestamp Generation
```python
# ✅ DO: Generate timestamps with UTC timezone
from datetime import datetime, timezone
# Python - Standard approach
def get_current_timestamp() -> datetime:
"""Get current timestamp with UTC timezone."""
return datetime.now(timezone.utc)
# For performance timing, prefer datetime over time.time()
def measure_performance():
start_time = datetime.now(timezone.utc)
# ... operation ...
elapsed = (datetime.now(timezone.utc) - start_time).total_seconds()
return elapsed
```
### Database Timestamps
```python
# ✅ DO: Store timestamps in UTC in the database
from sqlalchemy import Column, DateTime
from sqlalchemy.sql import func
class MediaFile(Base):
__tablename__ = "media_files"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
# Use timezone=True to ensure timezone awareness
created_at = Column(DateTime(timezone=True), server_default=func.now())
updated_at = Column(DateTime(timezone=True), onupdate=func.now())
# For manual updates, use UTC
def update_timestamp(self):
self.updated_at = datetime.now(timezone.utc)
```
### API Response Formatting
```python
# ✅ DO: Use ISO 8601 format for API responses
def format_timestamp_for_api(dt: datetime) -> str:
"""Format datetime as ISO 8601 string."""
return dt.isoformat()
# Example API response
{
"id": "123",
"name": "Example",
"created_at": "2025-01-15T10:30:45.123456Z", # ISO 8601 format with Z for UTC
"completed_at": "2025-01-15T10:35:12.789012Z"
}
```
### Filename Formatting
```python
# ✅ DO: Use YYYYMMDD_HHMMSS format for filenames
def generate_filename(prefix: str) -> str:
"""Generate filename with timestamp."""
timestamp = datetime.now(timezone.utc)
formatted = timestamp.strftime("%Y%m%d_%H%M%S")
return f"{prefix}_{formatted}.wav"
# Example: "recording_20250115_103045.wav"
# Example: "research_20250115_143022.md"
```
### Service-Specific Patterns
#### Transcription Service
```python
# ✅ DO: Use UTC for all transcription timestamps
class TranscriptionService:
def complete_transcription(self, result):
return {
"text": result.text,
"completed_at": datetime.now(timezone.utc).isoformat(),
"timestamp": datetime.now(timezone.utc),
"merged_at": datetime.now(timezone.utc).isoformat()
}
```
#### Performance Monitoring
```python
# ✅ DO: Use datetime for performance metrics
class PerformanceMonitor:
def record_metric(self, operation: str):
return {
"operation": operation,
"timestamp": datetime.now(timezone.utc).isoformat(),
"start_time": datetime.now(timezone.utc)
}
def measure_elapsed(self, start_time: datetime) -> float:
return (datetime.now(timezone.utc) - start_time).total_seconds()
```
#### Research and Export
```python
# ✅ DO: Consistent timestamp formatting for exports
def export_research_data(data: dict) -> dict:
return {
**data,
"timestamp": datetime.now(timezone.utc).isoformat(),
"generated_at": datetime.now(timezone.utc).isoformat()
}
def generate_export_filename(prefix: str, extension: str) -> str:
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
return f"{prefix}_{timestamp}.{extension}"
```
### Anti-Patterns
#### ❌ DON'T: Use naive datetime objects
```python
# Wrong! Missing timezone
timestamp = datetime.now() # Uses local timezone
completed_at = datetime.now().isoformat() # Inconsistent timezone
```
#### ❌ DON'T: Use deprecated datetime.utcnow()
```python
# Wrong! Deprecated method
profile.updated_at = datetime.utcnow() # Use datetime.now(timezone.utc) instead
```
#### ❌ DON'T: Mix time.time() and datetime for timing
```python
# Wrong! Inconsistent timing approach
start_time = time.time()
# ... operation ...
elapsed = time.time() - start_time
# Better: Use datetime consistently
start_time = datetime.now(timezone.utc)
# ... operation ...
elapsed = (datetime.now(timezone.utc) - start_time).total_seconds()
```
#### ❌ DON'T: Inconsistent filename formats
```python
# Wrong! Inconsistent formatting
file_name = f"research_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md" # Good
file_name = f"data_{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}.json" # Wrong format
# Example: "research_20250115_143022.md" (correct)
# Example: "data_2025-01-15 14:30:22.json" (incorrect)
```
## Migration Guidelines
### For Existing Code
1. **Replace `datetime.now()`** with `datetime.now(timezone.utc)`
2. **Replace `datetime.utcnow()`** with `datetime.now(timezone.utc)`
3. **Standardize filename formats** to `YYYYMMDD_HHMMSS`
4. **Use datetime for performance timing** instead of `time.time()`
5. **Ensure all database columns** use `DateTime(timezone=True)`
### Priority Files to Fix
Based on analysis, prioritize these files:
- `src/services/transcription_service.py` - Multiple naive datetime usages
- `src/services/local_transcription_service.py` - Naive datetime
- `src/repositories/speaker_profile_repository.py` - Uses deprecated utcnow()
- `src/base/batch_processor.py` - Uses deprecated utcnow()
### Periodic Cleanup Process (2025+)
#### Quarterly Timestamp Audit
Perform a comprehensive audit every 3 months to identify and fix timestamp inconsistencies:
```python
# ✅ DO: Create a timestamp audit script
import re
from pathlib import Path
from datetime import datetime, timezone
def audit_timestamps(project_root: Path):
"""Audit project for timestamp inconsistencies."""
issues = []
# Patterns to check for
patterns = {
'naive_datetime': r'datetime\.now\(\)',
'deprecated_utcnow': r'datetime\.utcnow\(\)',
'time_dot_time': r'time\.time\(\)',
'inconsistent_filename': r'strftime\([\'"][^Y]*%Y[^m]*%m[^d]*%d[^_]*_[^H]*%H[^M]*%M[^S]*%S[^\'"]*[\'"]\)'
}
for py_file in project_root.rglob('*.py'):
content = py_file.read_text()
for pattern_name, pattern in patterns.items():
if re.search(pattern, content):
issues.append(f"{py_file}: {pattern_name}")
return issues
```
#### Automated Cleanup Scripts
Create automated scripts to fix common timestamp issues:
```python
# ✅ DO: Automated timestamp cleanup
import re
from pathlib import Path
def fix_naive_datetime(file_path: Path):
"""Replace naive datetime.now() with UTC-aware version."""
content = file_path.read_text()
# Replace datetime.now() with datetime.now(timezone.utc)
fixed_content = re.sub(
r'datetime\.now\(\)',
'datetime.now(timezone.utc)',
content
)
# Replace datetime.utcnow() with datetime.now(timezone.utc)
fixed_content = re.sub(
r'datetime\.utcnow\(\)',
'datetime.now(timezone.utc)',
fixed_content
)
if fixed_content != content:
file_path.write_text(fixed_content)
return True
return False
def standardize_filename_formats(file_path: Path):
"""Standardize filename timestamp formats to YYYYMMDD_HHMMSS."""
content = file_path.read_text()
# Fix inconsistent filename formats
patterns = [
(r'strftime\([\'"][^Y]*%Y[^\-]*\-[^m]*%m[^\-]*\-[^d]*%d[^_]*_[^H]*%H[^\:]*\:[^M]*%M[^\:]*\:[^S]*%S[^\'"]*[\'"]\)',
'strftime("%Y%m%d_%H%M%S")'),
(r'strftime\([\'"][^Y]*%Y[^\-]*\-[^m]*%m[^\-]*\-[^d]*%d[^\'"]*[\'"]\)',
'strftime("%Y%m%d")')
]
for pattern, replacement in patterns:
content = re.sub(pattern, replacement, content)
if content != file_path.read_text():
file_path.write_text(content)
return True
return False
```
#### Cleanup Checklist (2025)
- [ ] **Q1 2025**: Audit all transcription services for naive datetime usage
- [ ] **Q2 2025**: Standardize all filename timestamp formats
- [ ] **Q3 2025**: Replace all `time.time()` usage with datetime objects
- [ ] **Q4 2025**: Verify all database migrations use `timezone=True`
- [ ] **Ongoing**: Fix timestamp issues in new code during code reviews
#### Legacy File Detection
Identify files with potentially problematic timestamp formats:
```python
# ✅ DO: Detect legacy timestamp patterns
def detect_legacy_timestamps(project_root: Path):
"""Detect files with legacy timestamp patterns."""
legacy_files = []
for py_file in project_root.rglob('*.py'):
content = py_file.read_text()
# Check for patterns that suggest legacy timestamp usage
legacy_patterns = [
r'datetime\.now\(\)', # Naive datetime
r'datetime\.utcnow\(\)', # Deprecated method
r'strftime\([\'"][^Y]*%Y[^m]*%m[^d]*%d[^\'"]*[\'"]\)', # Custom date formats
r'time\.time\(\)', # Unix timestamps instead of datetime
]
for pattern in legacy_patterns:
if re.search(pattern, content):
legacy_files.append(str(py_file))
break
return legacy_files
```
## Testing Timestamps
```python
# ✅ DO: Test timestamp generation
def test_timestamp_generation():
timestamp = get_current_timestamp()
assert timestamp.tzinfo == timezone.utc
assert timestamp.tzinfo is not None
def test_filename_formatting():
filename = generate_filename("test")
assert re.match(r"test_\d{8}_\d{6}\.wav", filename)
# Example: "test_20250115_143022.wav"
```
Always generate timestamps using `datetime.now(timezone.utc)`. Never use `datetime.now()` or `datetime.utcnow()`. For API responses, use ISO 8601 format. For filenames, use `YYYYMMDD_HHMMSS` format. All database timestamps must be stored in UTC with `timezone=True`.
# ❌ DON'T: Store timestamps without timezone info
created_at = Column(DateTime, server_default=func.now()) # Wrong! Missing timezone=True
```
Always generate timestamps using datetime functions with UTC timezone (e.g., datetime.now(timezone.utc)). Never hardcode or manually construct timestamps. For API responses, use ISO 8601 format. For filenames, use the format YYYYMMDD_HHMMSS. All database timestamps must be stored in UTC.