feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency by tlongwell-block · Pull Request #996 · block/buzz

tlongwell-block · 2026-06-11T23:21:06Z

What

Rewrites the kind:48106 voice-mode guidelines (posted to the ephemeral huddle channel at huddle start) to instruct agents to reply one sentence per message, sending the first sentence the moment it's formed.

Why

Today the dominant term in mic-stop → first-audio is waiting for the agent's complete reply: TTS can't start until the full kind:9 lands, so the entire LLM generation (2–10s) sits in the silent gap.

The desktop already speaks each agent kind:9 as it arrives — queued (depth 8), in order, deduped by event ID (useTtsSubscription → speak_agent_message). So an agent that posts its first sentence immediately, then the rest as separate messages, cuts time-to-first-audio from "full reply generated" to "first sentence generated". This is the prompt-level equivalent of token streaming, with zero harness or transport changes — the deliberate alternative to SSE streaming in buzz-agent, which we're keeping minimal.

Guideline changes:

sentence-per-message delivery, first sentence ASAP
say one short sentence before running a tool, so tool turns aren't silent gaps
a new human message mid-reply = interruption: drop unsent sentences instead of finishing the stale tail

Risk

Prompt-only; degrades gracefully. If a model ignores the instruction and posts a multi-sentence message, TTS already splits sentences internally and plays it normally — we only lose the head start on that turn. No error mode.

Verification

cargo test (desktop-tauri, full): 518/518 ✓
cargo clippy -D warnings ✓
Pre-push hooks (desktop/mobile/rust tests + clippy) ✓

Follow-ups (separate PRs)

Latency instrumentation: per-stage timestamps mic-stop → first-audio
Barge-in gate: after interrupt, mute the TTS subscription until the user's next transcript posts (otherwise the agent's late sentences re-trigger TTS)

Context: discussed in #sprout-conversational-agents — replaces the Concierge-destination approach from #910 (closed) with huddle-first latency work.

…latency The desktop already speaks each agent kind:9 as it arrives (queued, in order, deduped), so an agent that posts its first sentence immediately and the rest as separate messages cuts time-to-first-audio from "full reply generated" to "first sentence generated" — the prompt-level equivalent of token streaming, with no harness or transport changes. Rewrites the kind:48106 voice-mode guidelines to: - instruct sentence-per-message delivery, first sentence ASAP - say one short sentence before running a tool, so tool turns are not silent gaps - treat a new human message mid-reply as an interruption: drop unsent sentences instead of finishing the stale tail Prompt-only change; non-compliance degrades gracefully (TTS already splits multi-sentence messages internally — we only lose the head start). Co-authored-by: Tyler Longwell <tlongwell@squareup.com> Signed-off-by: Tyler Longwell <tlongwell@squareup.com>

…voice guidelines Per review: a model could read "send each sentence as its own message" as "compose the full reply, then split it", or fail to connect "message" to a `buzz messages send` tool call. Spell both out: reply immediately without composing the full reply first, and one sentence per separate `buzz messages send` call. Co-authored-by: Tyler Longwell <tlongwell@squareup.com> Signed-off-by: Tyler Longwell <tlongwell@squareup.com>

Co-authored-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> * origin/main: (35 commits) feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency (#996) Shard desktop Playwright CI jobs (#992) chore(release): release version 0.3.18 (#995) Video Player Improvements (#993) Improve first-run welcome setup (#970) fix(release): use legacy updater key secret (#991) Replace built-in personas with Fizz (#987) docs(buzz-acp): rewrite Communication Patterns for mention accuracy and threading clarity (#982) chore(justfile): build git-credential-nostr in dev and staging recipes (#980) Fix Buzz command migration for saved agents (#979) fix(desktop): resolve effective model and prompt from persona in display path (#972) docs: clean up remaining Buzz references (#977) chore(release): release version 0.3.17 (#976) fix(onboarding): skip onboarding when relay already has a profile (#973) docs: finish Buzz rename cleanup (#974) fix(desktop): let channel members bypass mention agent gate (#965) Rename desktop app to Buzz (#960) feat(desktop): open profile panel from MembersSidebar rows (#962) feat(desktop): per-event notification sounds and alert controls (#968) fix(desktop): make header chrome zoom-correct and tidy split-pane (#941) ... # Conflicts: # crates/buzz-agent/README.md # crates/buzz-agent/src/config.rs

…session-new * origin/main: fix(huddle): Pocket TTS quality overhaul — reference parity + cross-message pipelining (#997) Add manual ACP session rotation command (#932) fix(desktop): heal stale persona_team_dir paths in release builds (#1003) ci(docker): publish public ghcr.io/block/buzz image (native multi-arch) (#986) fix(buzz-agent): cap tool-result text at 50 KiB with middle elision (#952) feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency (#996) Shard desktop Playwright CI jobs (#992) chore(release): release version 0.3.18 (#995) Video Player Improvements (#993) Improve first-run welcome setup (#970) fix(release): use legacy updater key secret (#991) Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com> # Conflicts: # crates/buzz-acp/src/lib.rs # crates/buzz-agent/src/config.rs

…tate * origin/main: Add relay disconnect UX: friendly errors, reconnect, cached identity (#1004) feat(agents): add active turn indicators to Agents Menu (#1005) ci: add fork guards to docker, release, and auto-tag workflows (#1007) docs(nip-rs): add optional thread read context scheme (#1006) fix(huddle): Pocket TTS quality overhaul — reference parity + cross-message pipelining (#997) Add manual ACP session rotation command (#932) fix(desktop): heal stale persona_team_dir paths in release builds (#1003) ci(docker): publish public ghcr.io/block/buzz image (native multi-arch) (#986) fix(buzz-agent): cap tool-result text at 50 KiB with middle elision (#952) feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency (#996) Shard desktop Playwright CI jobs (#992) chore(release): release version 0.3.18 (#995) Video Player Improvements (#993) Improve first-run welcome setup (#970) fix(release): use legacy updater key secret (#991) Replace built-in personas with Fizz (#987)

npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d and others added 2 commits June 11, 2026 19:18

tlongwell-block merged commit 2846a96 into main Jun 11, 2026
23 checks passed

tlongwell-block deleted the voice-mode-sentence-streaming branch June 11, 2026 23:58

tlongwell-block mentioned this pull request Jun 12, 2026

fix(huddle): Pocket TTS quality overhaul — reference parity + cross-message pipelining #997

Merged

wpfleger96 mentioned this pull request Jun 12, 2026

chore(release): release version 0.3.19 #1014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency#996

feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency#996
tlongwell-block merged 2 commits into
mainfrom
voice-mode-sentence-streaming

tlongwell-block commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlongwell-block commented Jun 11, 2026

What

Why

Risk

Verification

Follow-ups (separate PRs)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant