--- description: Real file testing strategy for audio processing reliability and edge cases for tests/**/* and tests/fixtures/**/* alwaysApply: false --- # Real File Testing Rule ## Core Principles - **Real Data Testing**: Use actual audio files instead of mocks - **Edge Case Coverage**: Include diverse audio samples to catch issues - **Complete Processing**: Test the full processing pipeline - **Standard Test Fixtures**: Maintain a consistent set of test files ## Implementation Patterns ### Test Fixture Setup ```python # ✅ DO: Set up real audio file fixtures # tests/conftest.py import pytest from pathlib import Path @pytest.fixture def sample_audio_files(): """Provide real audio files for testing.""" fixtures_dir = Path(__file__).parent / "fixtures" / "audio" return { "short": fixtures_dir / "sample_5s.wav", "medium": fixtures_dir / "sample_30s.mp3", "long": fixtures_dir / "sample_2m.mp4", "noisy": fixtures_dir / "sample_noisy.wav", "multi_speaker": fixtures_dir / "sample_multi.wav", "technical": fixtures_dir / "sample_tech.mp3", } ``` ### Real File Testing ```python # ✅ DO: Test with real audio files # tests/test_transcription_service.py async def test_transcription_accuracy(sample_audio_files, transcription_service): """Test transcription with real audio files.""" # Use real file result = await transcription_service.transcribe_file( sample_audio_files["short"] ) # Verify actual results assert result.accuracy >= 0.95 # 95% accuracy requirement assert len(result.segments) > 0 assert result.processing_time < 30.0 # Performance requirement ``` ### Edge Case Testing ```python # ✅ DO: Test edge cases with specialized files async def test_noisy_audio_handling(sample_audio_files, transcription_service): """Test handling of noisy audio.""" result = await transcription_service.transcribe_file( sample_audio_files["noisy"] ) # Verify noise handling capabilities assert result.accuracy >= 0.85 # Lower threshold for noisy audio assert "confidence_scores" in result async def test_multi_speaker_detection(sample_audio_files, transcription_service): """Test multi-speaker detection.""" result = await transcription_service.transcribe_file( sample_audio_files["multi_speaker"], config={"diarization": True} ) # Verify speaker detection assert len(result.speakers) >= 2 assert all("speaker" in segment for segment in result.segments) ``` ### Anti-Patterns ```python # ❌ DON'T: Mock audio processing @patch("whisper.load_model") def test_transcription_mock(mock_whisper): # This won't catch real audio processing issues mock_whisper.return_value.transcribe.return_value = { "text": "Mocked transcription result" } service = TranscriptionService() result = service.transcribe_file("dummy_path.wav") # Only testing the mock, not real processing assert "Mocked transcription" in result.text # ❌ DON'T: Use synthetic or generated audio def test_with_synthetic_audio(): # Generating synthetic audio misses real-world issues synthetic_audio = generate_sine_wave(440, duration=5) # This won't catch real-world audio issues ``` When writing tests, ALWAYS use real audio files instead of mocks. Real files catch edge cases that mocks miss. Include test fixtures: sample_5s.wav, sample_30s.mp3, sample_2m.mp4, sample_noisy.wav, sample_multi.wav, sample_tech.mp3. Test with actual processing to ensure reliability.