Skip to content

fix(app + tui): clear stale running session indicators#17593

Closed
nexxeln wants to merge 5 commits into
devfrom
fix/stale-running-session-ui
Closed

fix(app + tui): clear stale running session indicators#17593
nexxeln wants to merge 5 commits into
devfrom
fix/stale-running-session-ui

Conversation

@nexxeln
Copy link
Copy Markdown
Member

@nexxeln nexxeln commented Mar 15, 2026

problem

  • desktop could stay active from two paths: stale bootstrap session status entries and old incomplete assistant messages in history
  • tui could also keep later messages queued from an old incomplete assistant

fix

  • bootstrap now reconciles session status so stale busy entries are cleared
  • desktop now uses live session status for running state and only uses the trailing assistant to pick the active turn
  • backend now finalizes assistant messages when prompt setup fails

tui and desktop

testing

  • bun test src/context/global-sync/session-cache.test.ts src/context/global-sync/event-reducer.test.ts
  • bun typecheck

Use live session status for the running UI and finalize assistant messages when prompt setup fails so crashed runs do not leave sessions stuck as active.
@nexxeln nexxeln requested a review from adamdotdevin as a code owner March 15, 2026 12:10
Inline the app-side running checks and make the tui only treat the trailing assistant as pending so old crashed messages do not keep sessions active or queued.
@nexxeln nexxeln changed the title fix(app): clear stale running session indicators fix(app + tui): clear stale running session indicators Mar 15, 2026
Comment on lines 141 to +144
const pending = createMemo(() => {
return messages().findLast((x) => x.role === "assistant" && !x.time.completed)?.id
const last = messages().findLast((x) => x.role === "assistant")
if (!last || last.time.completed) return
return last.id
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u explain this?

rmk40 added a commit to rmk40/opencode that referenced this pull request Apr 26, 2026
Document the user-reported symptom (stuck 'Thinking' shimmer and stuck
stop button on the input bar long after work finishes), the
investigation, and the proposed three-commit fix sequence.

Root cause: client-side derivations in message-timeline.tsx (added by
our commit 17), session.tsx::busy(), sidebar-items.tsx::SessionItem,
and session-turn.tsx all use 'message.findLast(incomplete assistant)'
as a working-state signal. That signal is permanently TRUE for any
session with a previously-crashed assistant message in DB —
regardless of current activity. session_status is the correct source
of truth and is reliably idle (verified by code-walk: processor.ts
halt path, run-state.ts cancel path, runner.ts onIdle).

Empirical confirmation: 12 incomplete assistant messages persist in
the live aai.db. The most recent is in the session the user reported
the bug from, persisted at 12:21:23 today.

Plan: three commits.

1. Trust session_status only on the client. Drop message.findLast
   working-state derivations in message-timeline.tsx, session.tsx,
   sidebar-items.tsx; tighten session-turn.tsx::pending to
   last-msg-only. This alone fixes the user-visible symptom.

2. Server-side: wrap the prompt-setup phase (lines 1430-1528 of
   prompt.ts) in an Effect tap-error that finalizes the in-flight
   assistant message before re-throwing. Stops new residue from
   accumulating.

3. Optional one-shot DB sweep on boot to finalize old residue.

Plus's prior fix (Dax's 6c9b2c3) is already in current via upstream
commits 51d8219 and the equivalent run-state.cancel logic. Plus's
fix targets the same area but is server-status-only; it does not
address the message-list-derivation bug, which is what's biting now.

Upstream PR anomalyco#17593 (open) does address it but predates the Effect
migration; we lift the ideas, not the diff.
rmk40 added a commit to rmk40/opencode that referenced this pull request Apr 26, 2026
Stuck 'Thinking' shimmer on the timeline progress glide. Our commit
17 (128a2c9) added `!!pending() || sessionStatus().type !== "idle"`
to the working memo, intending it as a self-heal. It's actually a
permanent false positive: pending() reads the DB-persisted message
list, and any assistant message that ever crashed mid-stream stays
incomplete (time.completed = undefined) forever. The result is that
sessions which previously crashed show 'working' forever even after
session_status correctly reports idle.

session_status reliably goes to idle on:
- processor.ts halt path (status.set idle on assistant error)
- run-state.ts cancel path (status.set idle when no runner exists)
- runner.ts onIdle (fires on any fiber exit: success/failure/interrupt)

Trust status only. The pending() memo is still used for
activeMessageID derivation (commit 17's last-msg-only change is
preserved there).

Empirical: 12 incomplete assistant messages persist in opencode-aai.db,
including one in the actively-debugged session created today. This
commit eliminates the user-visible symptom for the timeline.

Companion commits will apply the same trust-status-only pattern to
session.tsx::busy(), sidebar-items.tsx::SessionItem.isWorking, and
tighten session-turn.tsx::pending. Each is intentionally split so it
can be reverted independently if upstream PR anomalyco#17593 lands.

Refs upstream PR anomalyco#17593.
rmk40 added a commit to rmk40/opencode that referenced this pull request Apr 26, 2026
Same root cause as the message-timeline fix in 3e7acb8. The busy()
helper used to OR session_status against `some(incomplete assistant)`
on the message list. Sessions with previously-crashed assistant
messages stay 'busy' forever per that derivation, even when no work
is running.

Trust status only. Aligns with the helper inlined in upstream PR
anomalyco#17593's second commit (6bfce60).

Refs upstream PR anomalyco#17593.
rmk40 added a commit to rmk40/opencode that referenced this pull request Apr 26, 2026
Same root cause as 3e7acb8 (timeline) and 91ae391 (session.busy).
The sidebar's per-session 'working' badge was OR-ing session_status
against `findLast(incomplete assistant)`, leaving stale-residue
sessions perpetually showing the working indicator in the sidebar
list.

Trust status only. Aligns with upstream PR anomalyco#17593.

Refs upstream PR anomalyco#17593.
rmk40 added a commit to rmk40/opencode that referenced this pull request Apr 26, 2026
session-turn.tsx::pending used findLast over the full message list,
treating any older assistant message without time.completed as the
pending one. Combined with active() and pendingUser() derivations,
that meant the wrong turn could show as 'active' (drove the
'Thinking' shimmer in session-turn) for sessions with DB residue
from a previously-crashed stream.

Tighten to last-message-only: only the final message in the list
can legitimately be the pending assistant message. Matches the
pattern in upstream PR anomalyco#17593's commit 060f482.

Combined with 3e7acb8, 91ae391, and 347095d, this closes the
client-side stuck-Thinking / stuck-busy bug end-to-end:

- timeline working derivation: trusts status (3e7acb8)
- session.tsx::busy() helper: trusts status (91ae391)
- sidebar SessionItem.isWorking: trusts status (347095d)
- session-turn.tsx::pending: last-message-only (this commit)

The persistent DB residue (12 incomplete assistant messages in the
live aai.db) is now ignored by all UI working-state derivations.

Server-side fix to stop NEW residue from accumulating is a separate
follow-up (see docs/plans/stuck-session-status.md commit 2).

Refs upstream PR anomalyco#17593.
rmk40 added a commit to rmk40/opencode that referenced this pull request Apr 27, 2026
The aai.1 release shipped four UI commits that made working-state
derivations trust session_status only and tightened pending detection
to last-message-only (3e7acb8, 91ae391, 347095d, ef4e181).
That correctly silenced stuck Thinking from historical DB residue.

User reports stuck Thinking still appears on individual turns. SQL
probe of opencode-aai.db at the report time found 13 incomplete
assistant messages (time.completed undefined) — including TWO
created in the running aai.1 binary, in different sessions, both
without 'error' set (so neither went through processor.halt).

Diagnosis: prompt-setup-phase throws between sessions.updateMessage
(prompt.ts:1425) and the Effect.ensuring(processor.cleanup) scope
inside handle.process (begins at line 1487) leave the persisted
assistant message permanently incomplete in DB. processor.cleanup
DOES set time.completed at processor.ts:519, but its scope only
covers the inner stream-drain block — anything that throws BEFORE
handle.process is yielded never reaches that finalizer. Specifically:

  - processor.create's snapshot.track() at processor.ts:112 (git op)
  - resolveTools at prompt.ts:1436
  - plugin.trigger experimental.chat.messages.transform at line 1476
  - Effect.all([sys.skills, sys.environment, instruction.system,
    MessageV2.toModelMessagesEffect]) at line 1478

Each of these can throw with msg already in DB and time.completed
undefined. The last-message-only pending derivation introduced in
ef4e181 then renders that turn as 'Thinking' permanently.

Fix: hoist sessions.updateMessage(msg) and processor.create({...})
into an outer Effect.gen and apply Effect.onError to the whole
thing. The finalizer:

  - Returns immediately if msg.time.completed is already set
    (idempotence — processor.cleanup may have run first when
    handle.process was reached and then errored).
  - Otherwise sets msg.time.completed = Date.now() and persists via
    sessions.updateMessage(msg).pipe(Effect.catchCause(() => Effect.void)).

Why Effect.onError vs alternatives:

  - tapErrorCause is functionally equivalent for this purpose.
    onError matches the codebase's existing finalizer idiom (see
    runner.ts:82 Effect.onExit) more closely.
  - onExit fires on every exit including success, which would
    generate redundant updateMessage event publishes on every
    successful turn (the idempotence guard prevents the DB-side
    double-write but not the event publish).
  - Effect v4 finalizers (onError, ensuring, addFinalizer) all run
    uninterruptibly by default, so the cleanup body cannot be
    interrupted mid-write.

Why Effect.catchCause(() => Effect.void) instead of Effect.ignore:

  - sessions.updateMessage wraps Effect.sync(SyncEvent.run(...)).
    SyncEvent.run can throw synchronously on aggregate-missing or
    stale-version conditions, surfacing as a Die cause (defect)
    rather than a typed Fail. Effect.ignore in Effect v4 only
    converts typed Fail causes; defect causes can re-propagate and
    mask the original prompt-setup error. catchCause swallows ALL
    non-success Causes including defects, which is what we need to
    keep the cleanup truly infallible.

Why instruction.clear stays inside the inner gen:

  - instruction.clear(handle.message.id) requires handle which
    only exists after processor.create resolves. Moving it to the
    outer scope would lose access to handle. Pre-change behavior
    was the same: if processor.create threw, instruction.clear
    never ran (handle was undefined). New structure preserves that.

Why runLoop only (not handleSubtask or shellImpl):

  - Empirical evidence: the 13 residue messages are all in runLoop
    territory (from agents 'build' and 'plan', tool/text/step-start
    parts only, never subtask parts). handleSubtask at line 528
    and shellImpl at line 721 already have their own finalize
    paths via Effect.ensuring and explicit time.completed sets.
    No residue traceable to those paths.
  - If subsequent residue traceable to handleSubtask or shellImpl
    appears, a follow-up commit applies the same pattern there.

Why no boot-time DB sweep for the 13 historical messages:

  - With the parent plan's UI commits, those 13 messages are only
    visible as 'Thinking' on their respective last-message turns
    in their respective sessions — they no longer leak into other
    derivations. Cosmetic debt, not active spam. Bulk DB mutation
    of user state without explicit opt-in violates the project's
    principle of not silently changing data the user can see.

Plan: docs/plans/stuck-session-status-followup.md (added in this
commit). Dual-reviewed by code-review-opus and code-review-gpt5
both at plan stage and after implementation. Plan-stage review
caught two blockers (scope too narrow excluding processor.create;
Effect.ignore insufficient for defect Causes) and several issues
(verification realism, upstream-PR linkage, plan section naming);
all addressed before implementation. Implementation-stage review
returned 0 blockers, 0 issues, 3 optional nits (no action needed).

bun test from packages/opencode passes (2166 pass, 0 fail).
bun typecheck clean.

Refs upstream PR anomalyco#17593 (open) — same diagnosis
('backend now finalizes assistant messages when prompt setup
fails'). Predates the Effect migration in this codebase, so we
lift the idea, not the diff.
@rekram1-node
Copy link
Copy Markdown
Collaborator

Automated PR Cleanup

Thank you for contributing to opencode.

Due to the high volume of PRs from users and AI agents, we periodically close older PRs using automated criteria so maintainers can focus review time on the most active and community-supported contributions.

This PR was closed because it matched the following cleanup criteria:

  • The PR was created more than 1 month ago
  • The PR had fewer than 2 positive reactions
  • Positive reactions are counted as thumbs-up, heart, celebration, or rocket reactions on the PR

PRs created within the last month are not affected by this cleanup.

If you believe this PR was closed incorrectly, or if you are still actively working on it, please leave a comment explaining why it should be reopened. A maintainer can review and reopen it if appropriate.

Thanks again for taking the time to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants