Skip to content

fix(tts): reject unknown voices in /v1/audio/tts/qwen3 with 404#12

Open
ymote wants to merge 1 commit into
mainfrom
fix/reject-unknown-voice
Open

fix(tts): reject unknown voices in /v1/audio/tts/qwen3 with 404#12
ymote wants to merge 1 commit into
mainfrom
fix/reject-unknown-voice

Conversation

@ymote
Copy link
Copy Markdown
Contributor

@ymote ymote commented Apr 24, 2026

Summary

  • POST /v1/audio/tts/qwen3 previously accepted any voice string and let the TTS backend silently fall back to a default. Callers whose requested voice was never registered would get 200 OK + audio in the wrong voice, with no way to tell.
  • This PR validates the voice field against a registry of preset speakers + custom voices (from ~/.OminiX/models/voices.json) and returns 404 with a structured error body when the voice is unknown. Empty/missing/"default" voice still resolves to the documented default (vivian).
  • Logs the rejection at INFO with the requesting IP; /v1/voices listing and the TTS synthesis engine are untouched.

Observed symptom

On octos mini2, ominix-api reports {"voices":[]} (no custom voices registered). fm_tts calls with voice=yangmi return 200 OK with a valid WAV — but in vivian's voice, not yangmi. The LLM layer sees success and tells the user the clip is ready, so the wrong voice is surfaced to the end user. After this change fm_tts will start receiving 404 errors from unknown voices, which is the point — octos layers above will surface them.

Response shape (404)

{
  "error": "voice_not_found",
  "message": "Voice 'yangmi' is not registered. Available: vivian, serena, ...",
  "requested_voice": "yangmi",
  "available_voices": ["vivian", "serena", "ryan", "..."]
}

Backward compatibility

No client migration is required on octos's side. Clients that omit the voice field continue to work (empty -> default voice). Only clients that currently request a voice the server does not expose will see a new 404 instead of a wrong-voice 200 — which is the intended behavior.

Test plan

  • cargo test — 22 tests pass (13 new, including 4 handler-level integration tests via salvo's TestClient)
    • 404 response with correct body shape for unknown voice
    • 200 for registered preset (vivian)
    • 200 for missing voice field (default path preserved)
    • 200 for custom voice alias lookup
  • cargo clippy --tests --bin ominix-api — no new warnings
  • cargo build — clean
  • Deploy to octos mini2 and verify POST /v1/audio/tts/qwen3 {voice:"yangmi"} now returns 404

The `tts_qwen3` handler previously accepted any `voice` string and let
the TTS backend silently fall back to a default, so callers whose
requested voice was never registered (e.g. fm_tts on octos mini2
calling with `voice=yangmi` while the server has no custom voices
loaded) would receive `200 OK` + audio in the wrong voice. The LLM
layer could not detect the mismatch and relayed the wrong voice back
to the user.

Changes:

- New `voice_registry` module: collects the set of valid voices from
  the built-in Qwen3-TTS preset speakers plus any custom voices
  declared in `~/.OminiX/models/voices.json` (overridable via
  `OMINIX_VOICES_JSON`). Supports aliases.
- `tts_qwen3` now normalizes the requested voice (preserving the
  existing empty/"default" -> "vivian" fallback) and checks it
  against the registry before scheduling synthesis. Unknown voices
  return `404 Not Found` with a JSON body containing
  `error: "voice_not_found"`, the requested voice, and the list of
  available voices.
- Logs the rejection at INFO with requesting IP (x-forwarded-for,
  x-real-ip, or peer socket).
- `/v1/voices` listing endpoint is untouched; the synthesis engine is
  untouched.

Tests (+13):

- 8 unit tests for `VoiceRegistry` covering presets, custom voices,
  aliases, case sensitivity, missing/malformed JSON, and listing
  order.
- 4 handler-level integration tests via salvo's `TestClient`:
  unknown voice -> 404 with contract-shaped body, registered preset
  -> 200, missing voice field -> 200 (default path preserved),
  custom alias -> 200.
- 1 serialization sanity test for `VoiceNotFoundError`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant