Skip to content

feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency#996

Merged
tlongwell-block merged 2 commits into
mainfrom
voice-mode-sentence-streaming
Jun 11, 2026
Merged

feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency#996
tlongwell-block merged 2 commits into
mainfrom
voice-mode-sentence-streaming

Conversation

@tlongwell-block

Copy link
Copy Markdown
Collaborator

What

Rewrites the kind:48106 voice-mode guidelines (posted to the ephemeral huddle channel at huddle start) to instruct agents to reply one sentence per message, sending the first sentence the moment it's formed.

Why

Today the dominant term in mic-stop → first-audio is waiting for the agent's complete reply: TTS can't start until the full kind:9 lands, so the entire LLM generation (2–10s) sits in the silent gap.

The desktop already speaks each agent kind:9 as it arrives — queued (depth 8), in order, deduped by event ID (useTtsSubscriptionspeak_agent_message). So an agent that posts its first sentence immediately, then the rest as separate messages, cuts time-to-first-audio from "full reply generated" to "first sentence generated". This is the prompt-level equivalent of token streaming, with zero harness or transport changes — the deliberate alternative to SSE streaming in buzz-agent, which we're keeping minimal.

Guideline changes:

  • sentence-per-message delivery, first sentence ASAP
  • say one short sentence before running a tool, so tool turns aren't silent gaps
  • a new human message mid-reply = interruption: drop unsent sentences instead of finishing the stale tail

Risk

Prompt-only; degrades gracefully. If a model ignores the instruction and posts a multi-sentence message, TTS already splits sentences internally and plays it normally — we only lose the head start on that turn. No error mode.

Verification

  • cargo test (desktop-tauri, full): 518/518 ✓
  • cargo clippy -D warnings
  • Pre-push hooks (desktop/mobile/rust tests + clippy) ✓

Follow-ups (separate PRs)

  • Latency instrumentation: per-stage timestamps mic-stop → first-audio
  • Barge-in gate: after interrupt, mute the TTS subscription until the user's next transcript posts (otherwise the agent's late sentences re-trigger TTS)

Context: discussed in #sprout-conversational-agents — replaces the Concierge-destination approach from #910 (closed) with huddle-first latency work.

npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d and others added 2 commits June 11, 2026 19:18
…latency

The desktop already speaks each agent kind:9 as it arrives (queued, in
order, deduped), so an agent that posts its first sentence immediately
and the rest as separate messages cuts time-to-first-audio from "full
reply generated" to "first sentence generated" — the prompt-level
equivalent of token streaming, with no harness or transport changes.

Rewrites the kind:48106 voice-mode guidelines to:
- instruct sentence-per-message delivery, first sentence ASAP
- say one short sentence before running a tool, so tool turns are not
  silent gaps
- treat a new human message mid-reply as an interruption: drop unsent
  sentences instead of finishing the stale tail

Prompt-only change; non-compliance degrades gracefully (TTS already
splits multi-sentence messages internally — we only lose the head start).

Co-authored-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
…voice guidelines

Per review: a model could read "send each sentence as its own message"
as "compose the full reply, then split it", or fail to connect
"message" to a `buzz messages send` tool call. Spell both out: reply
immediately without composing the full reply first, and one sentence
per separate `buzz messages send` call.

Co-authored-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
@tlongwell-block tlongwell-block merged commit 2846a96 into main Jun 11, 2026
23 checks passed
@tlongwell-block tlongwell-block deleted the voice-mode-sentence-streaming branch June 11, 2026 23:58
tlongwell-block pushed a commit that referenced this pull request Jun 12, 2026
Co-authored-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>

* origin/main: (35 commits)
  feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency (#996)
  Shard desktop Playwright CI jobs (#992)
  chore(release): release version 0.3.18 (#995)
  Video Player Improvements  (#993)
  Improve first-run welcome setup (#970)
  fix(release): use legacy updater key secret (#991)
  Replace built-in personas with Fizz (#987)
  docs(buzz-acp): rewrite Communication Patterns for mention accuracy and threading clarity (#982)
  chore(justfile): build git-credential-nostr in dev and staging recipes (#980)
  Fix Buzz command migration for saved agents (#979)
  fix(desktop): resolve effective model and prompt from persona in display path (#972)
  docs: clean up remaining Buzz references (#977)
  chore(release): release version 0.3.17 (#976)
  fix(onboarding): skip onboarding when relay already has a profile (#973)
  docs: finish Buzz rename cleanup (#974)
  fix(desktop): let channel members bypass mention agent gate (#965)
  Rename desktop app to Buzz (#960)
  feat(desktop): open profile panel from MembersSidebar rows (#962)
  feat(desktop): per-event notification sounds and alert controls (#968)
  fix(desktop): make header chrome zoom-correct and tidy split-pane (#941)
  ...

# Conflicts:
#	crates/buzz-agent/README.md
#	crates/buzz-agent/src/config.rs
wpfleger96 pushed a commit that referenced this pull request Jun 12, 2026
…session-new

* origin/main:
  fix(huddle): Pocket TTS quality overhaul — reference parity + cross-message pipelining (#997)
  Add manual ACP session rotation command (#932)
  fix(desktop): heal stale persona_team_dir paths in release builds (#1003)
  ci(docker): publish public ghcr.io/block/buzz image (native multi-arch) (#986)
  fix(buzz-agent): cap tool-result text at 50 KiB with middle elision (#952)
  feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency (#996)
  Shard desktop Playwright CI jobs (#992)
  chore(release): release version 0.3.18 (#995)
  Video Player Improvements  (#993)
  Improve first-run welcome setup (#970)
  fix(release): use legacy updater key secret (#991)

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>

# Conflicts:
#	crates/buzz-acp/src/lib.rs
#	crates/buzz-agent/src/config.rs
tellaho added a commit that referenced this pull request Jun 12, 2026
…tate

* origin/main:
  Add relay disconnect UX: friendly errors, reconnect, cached identity (#1004)
  feat(agents): add active turn indicators to Agents Menu (#1005)
  ci: add fork guards to docker, release, and auto-tag workflows (#1007)
  docs(nip-rs): add optional thread read context scheme (#1006)
  fix(huddle): Pocket TTS quality overhaul — reference parity + cross-message pipelining (#997)
  Add manual ACP session rotation command (#932)
  fix(desktop): heal stale persona_team_dir paths in release builds (#1003)
  ci(docker): publish public ghcr.io/block/buzz image (native multi-arch) (#986)
  fix(buzz-agent): cap tool-result text at 50 KiB with middle elision (#952)
  feat(huddle): sentence-at-a-time voice-mode guidelines for lower TTS latency (#996)
  Shard desktop Playwright CI jobs (#992)
  chore(release): release version 0.3.18 (#995)
  Video Player Improvements  (#993)
  Improve first-run welcome setup (#970)
  fix(release): use legacy updater key secret (#991)
  Replace built-in personas with Fizz (#987)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant