trax/tests/.cursor/rules/real-file-testing.mdc

---
description: Real file testing strategy for audio processing reliability and edge cases for tests/**/* and tests/fixtures/**/*
alwaysApply: false
---
# Real File Testing Rule

## Core Principles
- **Real Data Testing**: Use actual audio files instead of mocks
- **Edge Case Coverage**: Include diverse audio samples to catch issues
- **Complete Processing**: Test the full processing pipeline
- **Standard Test Fixtures**: Maintain a consistent set of test files

## Implementation Patterns

### Test Fixture Setup
```python
# ✅ DO: Set up real audio file fixtures
# tests/conftest.py
import pytest
from pathlib import Path

@pytest.fixture
def sample_audio_files():
    """Provide real audio files for testing."""
    fixtures_dir = Path(__file__).parent / "fixtures" / "audio"
    return {
        "short": fixtures_dir / "sample_5s.wav",
        "medium": fixtures_dir / "sample_30s.mp3",
        "long": fixtures_dir / "sample_2m.mp4",
        "noisy": fixtures_dir / "sample_noisy.wav",
        "multi_speaker": fixtures_dir / "sample_multi.wav",
        "technical": fixtures_dir / "sample_tech.mp3",
    }
```

### Real File Testing
```python
# ✅ DO: Test with real audio files
# tests/test_transcription_service.py
async def test_transcription_accuracy(sample_audio_files, transcription_service):
    """Test transcription with real audio files."""
    # Use real file
    result = await transcription_service.transcribe_file(
        sample_audio_files["short"]
    )

    # Verify actual results
    assert result.accuracy >= 0.95  # 95% accuracy requirement
    assert len(result.segments) > 0
    assert result.processing_time < 30.0  # Performance requirement
```

### Edge Case Testing
```python
# ✅ DO: Test edge cases with specialized files
async def test_noisy_audio_handling(sample_audio_files, transcription_service):
    """Test handling of noisy audio."""
    result = await transcription_service.transcribe_file(
        sample_audio_files["noisy"]
    )

    # Verify noise handling capabilities
    assert result.accuracy >= 0.85  # Lower threshold for noisy audio
    assert "confidence_scores" in result

async def test_multi_speaker_detection(sample_audio_files, transcription_service):
    """Test multi-speaker detection."""
    result = await transcription_service.transcribe_file(
        sample_audio_files["multi_speaker"],
        config={"diarization": True}
    )

    # Verify speaker detection
    assert len(result.speakers) >= 2
    assert all("speaker" in segment for segment in result.segments)
```

### Anti-Patterns
```python
# ❌ DON'T: Mock audio processing
@patch("whisper.load_model")
def test_transcription_mock(mock_whisper):
    # This won't catch real audio processing issues
    mock_whisper.return_value.transcribe.return_value = {
        "text": "Mocked transcription result"
    }

    service = TranscriptionService()
    result = service.transcribe_file("dummy_path.wav")

    # Only testing the mock, not real processing
    assert "Mocked transcription" in result.text

# ❌ DON'T: Use synthetic or generated audio
def test_with_synthetic_audio():
    # Generating synthetic audio misses real-world issues
    synthetic_audio = generate_sine_wave(440, duration=5)
    # This won't catch real-world audio issues
```

When writing tests, ALWAYS use real audio files instead of mocks. Real files catch edge cases that mocks miss. Include test fixtures: sample_5s.wav, sample_30s.mp3, sample_2m.mp4, sample_noisy.wav, sample_multi.wav, sample_tech.mp3. Test with actual processing to ensure reliability.