Problem
Voice input has two isolation issues:
1. No session isolation — voice is global across all sessions
activeVoice is a module-level variable in packages/opencode/src/cli/cmd/tui/component/prompt/index.tsx:91:
// Module-level voice state: survives component remounts and route changes
let activeVoice: {
handle: Voice.StreamingHandle
...
}
When multiple sessions are open, they all share this single global voice handle. Speaking into the mic in one session writes text into all sessions' input boxes, because appendText/setText references get overwritten by whichever prompt component last mounted.
2. No audio source isolation — external sounds are captured
The recorder (packages/opencode/src/cli/cmd/tui/util/voice.ts:22-31) uses arecord or sox to capture from the default audio input device. There is no distinction between:
- The user's voice (intended input)
- System audio from playing videos, music, or other applications
The VAD (vad.ts) detects speech activity from whatever audio comes through, so any audio source with speech-like characteristics gets transcribed and written to the input box.
Expected behavior
- Each session should have its own independent voice instance — enabling/disabling voice in one session should not affect others.
- Voice should only capture from the microphone input, not from system/desktop audio output.
Environment
- MiMoCode CLI TUI
- Linux (WSL2) with PulseAudio
- Multiple sessions open simultaneously
Problem
Voice input has two isolation issues:
1. No session isolation — voice is global across all sessions
activeVoiceis a module-level variable inpackages/opencode/src/cli/cmd/tui/component/prompt/index.tsx:91:When multiple sessions are open, they all share this single global voice handle. Speaking into the mic in one session writes text into all sessions' input boxes, because
appendText/setTextreferences get overwritten by whichever prompt component last mounted.2. No audio source isolation — external sounds are captured
The recorder (
packages/opencode/src/cli/cmd/tui/util/voice.ts:22-31) usesarecordorsoxto capture from the default audio input device. There is no distinction between:The VAD (
vad.ts) detects speech activity from whatever audio comes through, so any audio source with speech-like characteristics gets transcribed and written to the input box.Expected behavior
Environment