102 lines
3.5 KiB
Plaintext
102 lines
3.5 KiB
Plaintext
---
|
|
description: Real file testing strategy for audio processing reliability and edge cases for tests/**/* and tests/fixtures/**/*
|
|
alwaysApply: false
|
|
---
|
|
# Real File Testing Rule
|
|
|
|
## Core Principles
|
|
- **Real Data Testing**: Use actual audio files instead of mocks
|
|
- **Edge Case Coverage**: Include diverse audio samples to catch issues
|
|
- **Complete Processing**: Test the full processing pipeline
|
|
- **Standard Test Fixtures**: Maintain a consistent set of test files
|
|
|
|
## Implementation Patterns
|
|
|
|
### Test Fixture Setup
|
|
```python
|
|
# ✅ DO: Set up real audio file fixtures
|
|
# tests/conftest.py
|
|
import pytest
|
|
from pathlib import Path
|
|
|
|
@pytest.fixture
|
|
def sample_audio_files():
|
|
"""Provide real audio files for testing."""
|
|
fixtures_dir = Path(__file__).parent / "fixtures" / "audio"
|
|
return {
|
|
"short": fixtures_dir / "sample_5s.wav",
|
|
"medium": fixtures_dir / "sample_30s.mp3",
|
|
"long": fixtures_dir / "sample_2m.mp4",
|
|
"noisy": fixtures_dir / "sample_noisy.wav",
|
|
"multi_speaker": fixtures_dir / "sample_multi.wav",
|
|
"technical": fixtures_dir / "sample_tech.mp3",
|
|
}
|
|
```
|
|
|
|
### Real File Testing
|
|
```python
|
|
# ✅ DO: Test with real audio files
|
|
# tests/test_transcription_service.py
|
|
async def test_transcription_accuracy(sample_audio_files, transcription_service):
|
|
"""Test transcription with real audio files."""
|
|
# Use real file
|
|
result = await transcription_service.transcribe_file(
|
|
sample_audio_files["short"]
|
|
)
|
|
|
|
# Verify actual results
|
|
assert result.accuracy >= 0.95 # 95% accuracy requirement
|
|
assert len(result.segments) > 0
|
|
assert result.processing_time < 30.0 # Performance requirement
|
|
```
|
|
|
|
### Edge Case Testing
|
|
```python
|
|
# ✅ DO: Test edge cases with specialized files
|
|
async def test_noisy_audio_handling(sample_audio_files, transcription_service):
|
|
"""Test handling of noisy audio."""
|
|
result = await transcription_service.transcribe_file(
|
|
sample_audio_files["noisy"]
|
|
)
|
|
|
|
# Verify noise handling capabilities
|
|
assert result.accuracy >= 0.85 # Lower threshold for noisy audio
|
|
assert "confidence_scores" in result
|
|
|
|
async def test_multi_speaker_detection(sample_audio_files, transcription_service):
|
|
"""Test multi-speaker detection."""
|
|
result = await transcription_service.transcribe_file(
|
|
sample_audio_files["multi_speaker"],
|
|
config={"diarization": True}
|
|
)
|
|
|
|
# Verify speaker detection
|
|
assert len(result.speakers) >= 2
|
|
assert all("speaker" in segment for segment in result.segments)
|
|
```
|
|
|
|
### Anti-Patterns
|
|
```python
|
|
# ❌ DON'T: Mock audio processing
|
|
@patch("whisper.load_model")
|
|
def test_transcription_mock(mock_whisper):
|
|
# This won't catch real audio processing issues
|
|
mock_whisper.return_value.transcribe.return_value = {
|
|
"text": "Mocked transcription result"
|
|
}
|
|
|
|
service = TranscriptionService()
|
|
result = service.transcribe_file("dummy_path.wav")
|
|
|
|
# Only testing the mock, not real processing
|
|
assert "Mocked transcription" in result.text
|
|
|
|
# ❌ DON'T: Use synthetic or generated audio
|
|
def test_with_synthetic_audio():
|
|
# Generating synthetic audio misses real-world issues
|
|
synthetic_audio = generate_sine_wave(440, duration=5)
|
|
# This won't catch real-world audio issues
|
|
```
|
|
|
|
When writing tests, ALWAYS use real audio files instead of mocks. Real files catch edge cases that mocks miss. Include test fixtures: sample_5s.wav, sample_30s.mp3, sample_2m.mp4, sample_noisy.wav, sample_multi.wav, sample_tech.mp3. Test with actual processing to ensure reliability.
|