Session handoff 2026-04-30 — CUDA wins shipped, Ascend saga narrowed#13
Open
ymote wants to merge 3 commits into
Open
Session handoff 2026-04-30 — CUDA wins shipped, Ascend saga narrowed#13ymote wants to merge 3 commits into
ymote wants to merge 3 commits into
Conversation
The `tts_qwen3` handler previously accepted any `voice` string and let the TTS backend silently fall back to a default, so callers whose requested voice was never registered (e.g. fm_tts on octos mini2 calling with `voice=yangmi` while the server has no custom voices loaded) would receive `200 OK` + audio in the wrong voice. The LLM layer could not detect the mismatch and relayed the wrong voice back to the user. Changes: - New `voice_registry` module: collects the set of valid voices from the built-in Qwen3-TTS preset speakers plus any custom voices declared in `~/.OminiX/models/voices.json` (overridable via `OMINIX_VOICES_JSON`). Supports aliases. - `tts_qwen3` now normalizes the requested voice (preserving the existing empty/"default" -> "vivian" fallback) and checks it against the registry before scheduling synthesis. Unknown voices return `404 Not Found` with a JSON body containing `error: "voice_not_found"`, the requested voice, and the list of available voices. - Logs the rejection at INFO with requesting IP (x-forwarded-for, x-real-ip, or peer socket). - `/v1/voices` listing endpoint is untouched; the synthesis engine is untouched. Tests (+13): - 8 unit tests for `VoiceRegistry` covering presets, custom voices, aliases, case sensitivity, missing/malformed JSON, and listing order. - 4 handler-level integration tests via salvo's `TestClient`: unknown voice -> 404 with contract-shaped body, registered preset -> 200, missing voice field -> 200 (default path preserved), custom alias -> 200. - 1 serialization sanity test for `VoiceNotFoundError`.
…ga narrowed Session-handoff doc (SESSION_HANDOFF_2026-04-30.md) captures: - Three CUDA wallclock wins (today): #182 +7% TTS env-flip, #187 +5.85% CUDA QIE norm fusion, #199 -11.2% CUDA TTS stochastic via parallel top-K + on-device sampling chain. All committed to OminiX-CUDA origin/main at 3fa53afd1. - Three silent bug fixes caught by perf agents: cudaMemcpyAsync race in P1 pattern, rep_penalty race in P2/predictor for duplicate tokens, CFG mask not indexed per-batch (silent wrong attention with negative prompts on CUDA QIE). - Ascend QIE saga narrowed from "anywhere in 60-block DiT" to "block-0 attention substep at REAL inputs" via three closed hypothesis classes (F16 matmul saturation, sigma/Euler chain, distributed-compounding cast noise). Existing dump infrastructure on ac03 supports the next bisect. - #99 conditioner.hpp pad fix landed on ac01 (commit 61c8e2f, build clean, runtime test pending QIE weights). Side artifacts also tracked: audit + exploration reports from today's work (ascend_native_engine_audit, cuda_*_perf_exploration, qie_*_findings, qie_clamp_disambiguation), Q4_0 dequant + matmul F32 oracle script, and the per-block bisect analysis script. Plus the .mcp.json codex MCP config. Bundles for resuming on a fresh machine (NOT in git, on Mac local): /Users/yuechen/home/ac03_saga_2026-04-30.bundle (147 MB, ac03 main HEAD 3daae48) /Users/yuechen/home/ac01_99_pad_fix.bundle (146 MB, #99 fix) /Users/yuechen/home/qie-saga-5.5.65-snapshot.bundle (146 MB, prior backup) For next agent: read SESSION_HANDOFF_2026-04-30.md end-to-end before dispatching anything. The bisect chain has narrowed dramatically; the next concrete step is the block-0 substep bisect with codex-corrected scope (start at 02_img_mod_out + chunks, not just 08_Q/K/V).
For another agent on a fresh machine without ~/.ssh/config aliases, SESSION_HANDOFF_2026-04-30.md now includes: - Hostname, port, user, key path for each of ac01/ac02/ac03 + zgx-5b44/zgx-3675 - Direct ssh command snippets - ~/.ssh/config aliases to copy - Smoke test loop to verify all boxes reachable - Note that keys (.pem for Ascend, ~/.ssh/id_ed25519 for CUDA) are NOT in the repo for security; ask user (Yue Chen) if missing on the new machine
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3fa53afd1.61c8e2f, build clean, runtime test pending).Side artifacts: audit reports, perf explorations, F32 oracle findings, dispatch decision-tree docs, codex MCP config (.mcp.json), per-block bisect analysis script.
Bundles for fresh-machine resume (NOT in git, on Mac local at
/Users/yuechen/home/):ac03_saga_2026-04-30.bundle(147 MB) — ac03 main HEAD3daae48ac01_99_pad_fix.bundle(146 MB) — #99 fixqie-saga-5.5.65-snapshot.bundle(146 MB) — prior backupTest plan
SESSION_HANDOFF_2026-04-30.mdreads cleanly and has the actual decisions / artifact paths a next agent needs.mcp.jsoncodex MCP config loads in Claude Code (other agents can use codex-as-MCP)