319 lines
10 KiB
Plaintext
319 lines
10 KiB
Plaintext
---
|
|
description: UTC timestamp handling patterns for consistent timezone management for src/**/* and other relevant directories
|
|
alwaysApply: false
|
|
---
|
|
# UTC Timestamps Rule
|
|
|
|
## Core Principles
|
|
- **UTC Only**: Always use UTC for all timestamps
|
|
- **Timezone Awareness**: Make timezone explicit in all datetime objects
|
|
- **Standard Formats**: Use ISO 8601 for APIs, YYYYMMDD_HHMMSS for filenames
|
|
- **No Manual Construction**: Generate timestamps with proper functions
|
|
- **Consistent Patterns**: Use the same timestamp approach across all services
|
|
|
|
## Implementation Patterns
|
|
|
|
### UTC Timestamp Generation
|
|
```python
|
|
# ✅ DO: Generate timestamps with UTC timezone
|
|
from datetime import datetime, timezone
|
|
|
|
# Python - Standard approach
|
|
def get_current_timestamp() -> datetime:
|
|
"""Get current timestamp with UTC timezone."""
|
|
return datetime.now(timezone.utc)
|
|
|
|
# For performance timing, prefer datetime over time.time()
|
|
def measure_performance():
|
|
start_time = datetime.now(timezone.utc)
|
|
# ... operation ...
|
|
elapsed = (datetime.now(timezone.utc) - start_time).total_seconds()
|
|
return elapsed
|
|
```
|
|
|
|
### Database Timestamps
|
|
```python
|
|
# ✅ DO: Store timestamps in UTC in the database
|
|
from sqlalchemy import Column, DateTime
|
|
from sqlalchemy.sql import func
|
|
|
|
class MediaFile(Base):
|
|
__tablename__ = "media_files"
|
|
|
|
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
|
|
# Use timezone=True to ensure timezone awareness
|
|
created_at = Column(DateTime(timezone=True), server_default=func.now())
|
|
updated_at = Column(DateTime(timezone=True), onupdate=func.now())
|
|
|
|
# For manual updates, use UTC
|
|
def update_timestamp(self):
|
|
self.updated_at = datetime.now(timezone.utc)
|
|
```
|
|
|
|
### API Response Formatting
|
|
```python
|
|
# ✅ DO: Use ISO 8601 format for API responses
|
|
def format_timestamp_for_api(dt: datetime) -> str:
|
|
"""Format datetime as ISO 8601 string."""
|
|
return dt.isoformat()
|
|
|
|
# Example API response
|
|
{
|
|
"id": "123",
|
|
"name": "Example",
|
|
"created_at": "2025-01-15T10:30:45.123456Z", # ISO 8601 format with Z for UTC
|
|
"completed_at": "2025-01-15T10:35:12.789012Z"
|
|
}
|
|
```
|
|
|
|
### Filename Formatting
|
|
```python
|
|
# ✅ DO: Use YYYYMMDD_HHMMSS format for filenames
|
|
def generate_filename(prefix: str) -> str:
|
|
"""Generate filename with timestamp."""
|
|
timestamp = datetime.now(timezone.utc)
|
|
formatted = timestamp.strftime("%Y%m%d_%H%M%S")
|
|
return f"{prefix}_{formatted}.wav"
|
|
|
|
# Example: "recording_20250115_103045.wav"
|
|
# Example: "research_20250115_143022.md"
|
|
```
|
|
|
|
### Service-Specific Patterns
|
|
|
|
#### Transcription Service
|
|
```python
|
|
# ✅ DO: Use UTC for all transcription timestamps
|
|
class TranscriptionService:
|
|
def complete_transcription(self, result):
|
|
return {
|
|
"text": result.text,
|
|
"completed_at": datetime.now(timezone.utc).isoformat(),
|
|
"timestamp": datetime.now(timezone.utc),
|
|
"merged_at": datetime.now(timezone.utc).isoformat()
|
|
}
|
|
```
|
|
|
|
#### Performance Monitoring
|
|
```python
|
|
# ✅ DO: Use datetime for performance metrics
|
|
class PerformanceMonitor:
|
|
def record_metric(self, operation: str):
|
|
return {
|
|
"operation": operation,
|
|
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
"start_time": datetime.now(timezone.utc)
|
|
}
|
|
|
|
def measure_elapsed(self, start_time: datetime) -> float:
|
|
return (datetime.now(timezone.utc) - start_time).total_seconds()
|
|
```
|
|
|
|
#### Research and Export
|
|
```python
|
|
# ✅ DO: Consistent timestamp formatting for exports
|
|
def export_research_data(data: dict) -> dict:
|
|
return {
|
|
**data,
|
|
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
"generated_at": datetime.now(timezone.utc).isoformat()
|
|
}
|
|
|
|
def generate_export_filename(prefix: str, extension: str) -> str:
|
|
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
|
|
return f"{prefix}_{timestamp}.{extension}"
|
|
```
|
|
|
|
### Anti-Patterns
|
|
|
|
#### ❌ DON'T: Use naive datetime objects
|
|
```python
|
|
# Wrong! Missing timezone
|
|
timestamp = datetime.now() # Uses local timezone
|
|
completed_at = datetime.now().isoformat() # Inconsistent timezone
|
|
```
|
|
|
|
#### ❌ DON'T: Use deprecated datetime.utcnow()
|
|
```python
|
|
# Wrong! Deprecated method
|
|
profile.updated_at = datetime.utcnow() # Use datetime.now(timezone.utc) instead
|
|
```
|
|
|
|
#### ❌ DON'T: Mix time.time() and datetime for timing
|
|
```python
|
|
# Wrong! Inconsistent timing approach
|
|
start_time = time.time()
|
|
# ... operation ...
|
|
elapsed = time.time() - start_time
|
|
|
|
# Better: Use datetime consistently
|
|
start_time = datetime.now(timezone.utc)
|
|
# ... operation ...
|
|
elapsed = (datetime.now(timezone.utc) - start_time).total_seconds()
|
|
```
|
|
|
|
#### ❌ DON'T: Inconsistent filename formats
|
|
```python
|
|
# Wrong! Inconsistent formatting
|
|
file_name = f"research_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md" # Good
|
|
file_name = f"data_{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}.json" # Wrong format
|
|
# Example: "research_20250115_143022.md" (correct)
|
|
# Example: "data_2025-01-15 14:30:22.json" (incorrect)
|
|
```
|
|
|
|
## Migration Guidelines
|
|
|
|
### For Existing Code
|
|
1. **Replace `datetime.now()`** with `datetime.now(timezone.utc)`
|
|
2. **Replace `datetime.utcnow()`** with `datetime.now(timezone.utc)`
|
|
3. **Standardize filename formats** to `YYYYMMDD_HHMMSS`
|
|
4. **Use datetime for performance timing** instead of `time.time()`
|
|
5. **Ensure all database columns** use `DateTime(timezone=True)`
|
|
|
|
### Priority Files to Fix
|
|
Based on analysis, prioritize these files:
|
|
- `src/services/transcription_service.py` - Multiple naive datetime usages
|
|
- `src/services/local_transcription_service.py` - Naive datetime
|
|
- `src/repositories/speaker_profile_repository.py` - Uses deprecated utcnow()
|
|
- `src/base/batch_processor.py` - Uses deprecated utcnow()
|
|
|
|
### Periodic Cleanup Process (2025+)
|
|
|
|
#### Quarterly Timestamp Audit
|
|
Perform a comprehensive audit every 3 months to identify and fix timestamp inconsistencies:
|
|
|
|
```python
|
|
# ✅ DO: Create a timestamp audit script
|
|
import re
|
|
from pathlib import Path
|
|
from datetime import datetime, timezone
|
|
|
|
def audit_timestamps(project_root: Path):
|
|
"""Audit project for timestamp inconsistencies."""
|
|
issues = []
|
|
|
|
# Patterns to check for
|
|
patterns = {
|
|
'naive_datetime': r'datetime\.now\(\)',
|
|
'deprecated_utcnow': r'datetime\.utcnow\(\)',
|
|
'time_dot_time': r'time\.time\(\)',
|
|
'inconsistent_filename': r'strftime\([\'"][^Y]*%Y[^m]*%m[^d]*%d[^_]*_[^H]*%H[^M]*%M[^S]*%S[^\'"]*[\'"]\)'
|
|
}
|
|
|
|
for py_file in project_root.rglob('*.py'):
|
|
content = py_file.read_text()
|
|
for pattern_name, pattern in patterns.items():
|
|
if re.search(pattern, content):
|
|
issues.append(f"{py_file}: {pattern_name}")
|
|
|
|
return issues
|
|
```
|
|
|
|
#### Automated Cleanup Scripts
|
|
Create automated scripts to fix common timestamp issues:
|
|
|
|
```python
|
|
# ✅ DO: Automated timestamp cleanup
|
|
import re
|
|
from pathlib import Path
|
|
|
|
def fix_naive_datetime(file_path: Path):
|
|
"""Replace naive datetime.now() with UTC-aware version."""
|
|
content = file_path.read_text()
|
|
|
|
# Replace datetime.now() with datetime.now(timezone.utc)
|
|
fixed_content = re.sub(
|
|
r'datetime\.now\(\)',
|
|
'datetime.now(timezone.utc)',
|
|
content
|
|
)
|
|
|
|
# Replace datetime.utcnow() with datetime.now(timezone.utc)
|
|
fixed_content = re.sub(
|
|
r'datetime\.utcnow\(\)',
|
|
'datetime.now(timezone.utc)',
|
|
fixed_content
|
|
)
|
|
|
|
if fixed_content != content:
|
|
file_path.write_text(fixed_content)
|
|
return True
|
|
return False
|
|
|
|
def standardize_filename_formats(file_path: Path):
|
|
"""Standardize filename timestamp formats to YYYYMMDD_HHMMSS."""
|
|
content = file_path.read_text()
|
|
|
|
# Fix inconsistent filename formats
|
|
patterns = [
|
|
(r'strftime\([\'"][^Y]*%Y[^\-]*\-[^m]*%m[^\-]*\-[^d]*%d[^_]*_[^H]*%H[^\:]*\:[^M]*%M[^\:]*\:[^S]*%S[^\'"]*[\'"]\)',
|
|
'strftime("%Y%m%d_%H%M%S")'),
|
|
(r'strftime\([\'"][^Y]*%Y[^\-]*\-[^m]*%m[^\-]*\-[^d]*%d[^\'"]*[\'"]\)',
|
|
'strftime("%Y%m%d")')
|
|
]
|
|
|
|
for pattern, replacement in patterns:
|
|
content = re.sub(pattern, replacement, content)
|
|
|
|
if content != file_path.read_text():
|
|
file_path.write_text(content)
|
|
return True
|
|
return False
|
|
```
|
|
|
|
#### Cleanup Checklist (2025)
|
|
- [ ] **Q1 2025**: Audit all transcription services for naive datetime usage
|
|
- [ ] **Q2 2025**: Standardize all filename timestamp formats
|
|
- [ ] **Q3 2025**: Replace all `time.time()` usage with datetime objects
|
|
- [ ] **Q4 2025**: Verify all database migrations use `timezone=True`
|
|
- [ ] **Ongoing**: Fix timestamp issues in new code during code reviews
|
|
|
|
#### Legacy File Detection
|
|
Identify files with potentially problematic timestamp formats:
|
|
|
|
```python
|
|
# ✅ DO: Detect legacy timestamp patterns
|
|
def detect_legacy_timestamps(project_root: Path):
|
|
"""Detect files with legacy timestamp patterns."""
|
|
legacy_files = []
|
|
|
|
for py_file in project_root.rglob('*.py'):
|
|
content = py_file.read_text()
|
|
|
|
# Check for patterns that suggest legacy timestamp usage
|
|
legacy_patterns = [
|
|
r'datetime\.now\(\)', # Naive datetime
|
|
r'datetime\.utcnow\(\)', # Deprecated method
|
|
r'strftime\([\'"][^Y]*%Y[^m]*%m[^d]*%d[^\'"]*[\'"]\)', # Custom date formats
|
|
r'time\.time\(\)', # Unix timestamps instead of datetime
|
|
]
|
|
|
|
for pattern in legacy_patterns:
|
|
if re.search(pattern, content):
|
|
legacy_files.append(str(py_file))
|
|
break
|
|
|
|
return legacy_files
|
|
```
|
|
|
|
## Testing Timestamps
|
|
|
|
```python
|
|
# ✅ DO: Test timestamp generation
|
|
def test_timestamp_generation():
|
|
timestamp = get_current_timestamp()
|
|
assert timestamp.tzinfo == timezone.utc
|
|
assert timestamp.tzinfo is not None
|
|
|
|
def test_filename_formatting():
|
|
filename = generate_filename("test")
|
|
assert re.match(r"test_\d{8}_\d{6}\.wav", filename)
|
|
# Example: "test_20250115_143022.wav"
|
|
```
|
|
|
|
Always generate timestamps using `datetime.now(timezone.utc)`. Never use `datetime.now()` or `datetime.utcnow()`. For API responses, use ISO 8601 format. For filenames, use `YYYYMMDD_HHMMSS` format. All database timestamps must be stored in UTC with `timezone=True`.
|
|
# ❌ DON'T: Store timestamps without timezone info
|
|
created_at = Column(DateTime, server_default=func.now()) # Wrong! Missing timezone=True
|
|
```
|
|
|
|
Always generate timestamps using datetime functions with UTC timezone (e.g., datetime.now(timezone.utc)). Never hardcode or manually construct timestamps. For API responses, use ISO 8601 format. For filenames, use the format YYYYMMDD_HHMMSS. All database timestamps must be stored in UTC. |