trax/tests/.cursor/rules/real-file-testing.mdc

102 lines
3.5 KiB
Plaintext

---
description: Real file testing strategy for audio processing reliability and edge cases for tests/**/* and tests/fixtures/**/*
alwaysApply: false
---
# Real File Testing Rule
## Core Principles
- **Real Data Testing**: Use actual audio files instead of mocks
- **Edge Case Coverage**: Include diverse audio samples to catch issues
- **Complete Processing**: Test the full processing pipeline
- **Standard Test Fixtures**: Maintain a consistent set of test files
## Implementation Patterns
### Test Fixture Setup
```python
# ✅ DO: Set up real audio file fixtures
# tests/conftest.py
import pytest
from pathlib import Path
@pytest.fixture
def sample_audio_files():
"""Provide real audio files for testing."""
fixtures_dir = Path(__file__).parent / "fixtures" / "audio"
return {
"short": fixtures_dir / "sample_5s.wav",
"medium": fixtures_dir / "sample_30s.mp3",
"long": fixtures_dir / "sample_2m.mp4",
"noisy": fixtures_dir / "sample_noisy.wav",
"multi_speaker": fixtures_dir / "sample_multi.wav",
"technical": fixtures_dir / "sample_tech.mp3",
}
```
### Real File Testing
```python
# ✅ DO: Test with real audio files
# tests/test_transcription_service.py
async def test_transcription_accuracy(sample_audio_files, transcription_service):
"""Test transcription with real audio files."""
# Use real file
result = await transcription_service.transcribe_file(
sample_audio_files["short"]
)
# Verify actual results
assert result.accuracy >= 0.95 # 95% accuracy requirement
assert len(result.segments) > 0
assert result.processing_time < 30.0 # Performance requirement
```
### Edge Case Testing
```python
# ✅ DO: Test edge cases with specialized files
async def test_noisy_audio_handling(sample_audio_files, transcription_service):
"""Test handling of noisy audio."""
result = await transcription_service.transcribe_file(
sample_audio_files["noisy"]
)
# Verify noise handling capabilities
assert result.accuracy >= 0.85 # Lower threshold for noisy audio
assert "confidence_scores" in result
async def test_multi_speaker_detection(sample_audio_files, transcription_service):
"""Test multi-speaker detection."""
result = await transcription_service.transcribe_file(
sample_audio_files["multi_speaker"],
config={"diarization": True}
)
# Verify speaker detection
assert len(result.speakers) >= 2
assert all("speaker" in segment for segment in result.segments)
```
### Anti-Patterns
```python
# ❌ DON'T: Mock audio processing
@patch("whisper.load_model")
def test_transcription_mock(mock_whisper):
# This won't catch real audio processing issues
mock_whisper.return_value.transcribe.return_value = {
"text": "Mocked transcription result"
}
service = TranscriptionService()
result = service.transcribe_file("dummy_path.wav")
# Only testing the mock, not real processing
assert "Mocked transcription" in result.text
# ❌ DON'T: Use synthetic or generated audio
def test_with_synthetic_audio():
# Generating synthetic audio misses real-world issues
synthetic_audio = generate_sine_wave(440, duration=5)
# This won't catch real-world audio issues
```
When writing tests, ALWAYS use real audio files instead of mocks. Real files catch edge cases that mocks miss. Include test fixtures: sample_5s.wav, sample_30s.mp3, sample_2m.mp4, sample_noisy.wav, sample_multi.wav, sample_tech.mp3. Test with actual processing to ensure reliability.