fix(tts): reject unknown voices in /v1/audio/tts/qwen3 with 404#12
Open
ymote wants to merge 1 commit into
Open
fix(tts): reject unknown voices in /v1/audio/tts/qwen3 with 404#12ymote wants to merge 1 commit into
ymote wants to merge 1 commit into
Conversation
The `tts_qwen3` handler previously accepted any `voice` string and let the TTS backend silently fall back to a default, so callers whose requested voice was never registered (e.g. fm_tts on octos mini2 calling with `voice=yangmi` while the server has no custom voices loaded) would receive `200 OK` + audio in the wrong voice. The LLM layer could not detect the mismatch and relayed the wrong voice back to the user. Changes: - New `voice_registry` module: collects the set of valid voices from the built-in Qwen3-TTS preset speakers plus any custom voices declared in `~/.OminiX/models/voices.json` (overridable via `OMINIX_VOICES_JSON`). Supports aliases. - `tts_qwen3` now normalizes the requested voice (preserving the existing empty/"default" -> "vivian" fallback) and checks it against the registry before scheduling synthesis. Unknown voices return `404 Not Found` with a JSON body containing `error: "voice_not_found"`, the requested voice, and the list of available voices. - Logs the rejection at INFO with requesting IP (x-forwarded-for, x-real-ip, or peer socket). - `/v1/voices` listing endpoint is untouched; the synthesis engine is untouched. Tests (+13): - 8 unit tests for `VoiceRegistry` covering presets, custom voices, aliases, case sensitivity, missing/malformed JSON, and listing order. - 4 handler-level integration tests via salvo's `TestClient`: unknown voice -> 404 with contract-shaped body, registered preset -> 200, missing voice field -> 200 (default path preserved), custom alias -> 200. - 1 serialization sanity test for `VoiceNotFoundError`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /v1/audio/tts/qwen3previously accepted anyvoicestring and let the TTS backend silently fall back to a default. Callers whose requested voice was never registered would get200 OK+ audio in the wrong voice, with no way to tell.voicefield against a registry of preset speakers + custom voices (from~/.OminiX/models/voices.json) and returns404with a structured error body when the voice is unknown. Empty/missing/"default"voice still resolves to the documented default (vivian)./v1/voiceslisting and the TTS synthesis engine are untouched.Observed symptom
On octos mini2,
ominix-apireports{"voices":[]}(no custom voices registered). fm_tts calls withvoice=yangmireturn200 OKwith a valid WAV — but invivian's voice, notyangmi. The LLM layer sees success and tells the user the clip is ready, so the wrong voice is surfaced to the end user. After this change fm_tts will start receiving 404 errors from unknown voices, which is the point — octos layers above will surface them.Response shape (404)
{ "error": "voice_not_found", "message": "Voice 'yangmi' is not registered. Available: vivian, serena, ...", "requested_voice": "yangmi", "available_voices": ["vivian", "serena", "ryan", "..."] }Backward compatibility
No client migration is required on octos's side. Clients that omit the
voicefield continue to work (empty -> default voice). Only clients that currently request a voice the server does not expose will see a new 404 instead of a wrong-voice 200 — which is the intended behavior.Test plan
cargo test— 22 tests pass (13 new, including 4 handler-level integration tests via salvo'sTestClient)vivian)cargo clippy --tests --bin ominix-api— no new warningscargo build— cleanPOST /v1/audio/tts/qwen3 {voice:"yangmi"}now returns 404