Skip to content

[pull] main from openai:main#58

Open
pull[bot] wants to merge 3521 commits into
kontext-security:mainfrom
openai:main
Open

[pull] main from openai:main#58
pull[bot] wants to merge 3521 commits into
kontext-security:mainfrom
openai:main

Conversation

@pull

@pull pull Bot commented Mar 12, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull Bot locked and limited conversation to collaborators Mar 12, 2026
@pull pull Bot added ⤵️ pull merge-conflict Sync PR has merge conflicts labels Mar 12, 2026
owenlin0 and others added 27 commits June 22, 2026 09:05
## Description

Restore `thread_source` in `x-codex-turn-metadata`.

Inadvertently removed `thread_source` from `x-codex-turn-metadata` in
#27122 - didn't realize it was a
top-level thread app-server API field, not passed in
`responsesapi_client_metadata`.

This also reserves the key so `responsesapi_client_metadata` cannot
override it.
## Why

The local SQLite log sink currently enables TRACE for every target. This
persists high-volume dependency logs bridged through `target=log` and
duplicates OpenTelemetry mirror events in `codex_otel.log_only` and
`codex_otel.trace_safe`.

These records rapidly consume the per-partition log budget and cause
unnecessary SQLite insert-and-prune churn.

## What changed

- Keep TRACE persistence for other targets.
- Exclude bridged `target=log` events from the SQLite sink.
- Exclude the two `codex_otel` mirror targets from the SQLite sink.
- Share the same filter between app-server and TUI.

Remote OpenTelemetry export and metrics are unchanged.
## What

- make Fjord's centralized response-item image preparation unconditional
for new and resumed history
- have local user images and `view_image` outputs always defer decoding
and resizing to that path
- retain `resize_all_images` as an ignored, removed compatibility key
for released clients
- delete the flag-off producer paths and obsolete policy-specific tests

## Why

Centralized preparation is now the intended image path. Keeping the
runtime feature checks also kept two image-processing implementations
alive and allowed client config to select the legacy behavior.

This is a clean replacement for #28975, rebuilt from the latest `main`.

## How

`prepare_response_items` now runs whenever items enter history and
whenever persisted history is reconstructed. Producers emit deferred
image data, so malformed images become the existing model-visible
placeholder instead of failing the session at the producer.

## Test plan

- `just fmt`
- `just fix -p codex-core -p codex-features`
- `just test -p codex-features` — 52 passed
- focused affected `codex-core` set — 20 passed
- `just test -p codex-core handle_accepts_explicit_high_detail` — 1
passed
- full `just test -p codex-core` attempt — 2,723 passed; 88 unrelated
environment failures from read-only `~/.codex` SQLite state and
unavailable integration helper binaries
The custom Windows argument-comment-lint job was temporarily moved to
`windows-2022` in #28940 after hermetic LLVM source extraction failed on
the newer runner. This takes the upstream extraction fix so the job can
return to the intended custom runner.

This upgrades `llvm` to `0.7.9` and `rules_cc` to `0.2.18`, refreshes
the module lock, rebases the remaining Windows and custom libc++
patches, drops the obsolete symlink-extraction workaround, and restores
the `windows-x64` runner configuration.

Validation:

- Verified all LLVM patches apply cleanly against the `0.7.9` source.
- Built `@llvm-project//compiler-rt:clang_rt.builtins.static`.
This PR moves construction of `PluginTelemetryMetadata` from loader and
model helpers into `PluginsManager`, which already owns installed plugin
state and will eventually perform remote identity enrichment. The
metadata type remains in `codex-plugin`, and serialized analytics events
remain unchanged.

## Before

```mermaid
flowchart LR
    subgraph Events["Analytics event paths"]
        direction TB
        Lifecycle["Local install / uninstall"]
        Config["Enable / disable"]
        Remote["Remote install"]
        Used["Plugin used"]
    end

    subgraph Construction["Metadata construction"]
        direction TB
        Loader["Loader telemetry helpers"]
        Summary["PluginCapabilitySummary::telemetry_metadata"]
        Override["Caller adds remote_plugin_id"]
    end

    Metadata["PluginTelemetryMetadata"]

    Lifecycle --> Loader
    Config --> Loader
    Remote --> Loader
    Loader -->|"local events"| Metadata
    Loader -->|"remote install"| Override
    Override --> Metadata
    Used --> Summary
    Summary --> Metadata
```

Telemetry metadata was constructed through loader helpers, a
capability-summary method, and a remote-install call-site override.

## After

```mermaid
flowchart LR
    subgraph Events["Analytics event paths"]
        direction TB
        Lifecycle["Local install / uninstall"]
        Config["Enable / disable"]
        Remote["Remote install"]
        Used["Plugin used"]
    end

    Manager["PluginsManager — single construction owner"]
    Metadata["PluginTelemetryMetadata"]

    Lifecycle --> Manager
    Config --> Manager
    Remote -->|"authoritative remote ID"| Manager
    Used -->|"capability summary"| Manager
    Manager --> Metadata
```

Every analytics path delegates metadata construction to
`PluginsManager`. Remote install still supplies its authoritative
backend ID explicitly.

## What Changes

- Make loader code return a focused plugin capability summary instead of
constructing analytics metadata.
- Centralize immutable plugin telemetry metadata construction in
`PluginsManager`.
- Route local install/uninstall, remote install, enable/disable, and
plugin-used emitters through the manager.
- Preserve the current serialized analytics contract exactly.

Normal metadata still has no remote override. Remote install continues
to provide its authoritative backend ID explicitly, so the existing
serializer continues reporting that ID through `plugin_id`.
Snapshot-based enrichment is intentionally deferred to the final PR.

## Testing

- `just test -p codex-core-plugins` (238 tests passed)
- `just test -p codex-plugin` (3 tests passed)
- Scoped Clippy/compile checks passed for `codex-plugin`,
`codex-core-plugins`, `codex-app-server`, and `codex-core`.

## Split Overview

```text
main
├── #27093  Debug analytics capture                 (merged)
├── #27099  Non-mutating plugin smoke               (merged)
├── #27100  Remote install/uninstall smoke          (merged)
└── #27102  Plugin telemetry metadata refactor      ← you are here
    └── #27669  Persist remote plugin identity

After #27102 and #27669 merge:
└── Final PR: add explicit local and remote IDs to plugin analytics
```

Review order and dependencies:

1. [#27093 Add debug-only analytics event
capture](#27093) (merged)
2. [#27099 Add a plugin analytics smoke
workflow](#27099) (merged)
3. [#27100 Add a remote plugin analytics mutation smoke
workflow](#27100) (merged)
4. This metadata refactor, independent and based on `main`
5. [#27669 Persist remote plugin
identity](#27669), stacked on this
PR
6. Final remote-ID behavior PR, created after the prerequisites merge

The original [#26281](#26281)
remains open as the aggregate reference until the final replacement PR
is published.
## Summary

[#26701](#26701) added remote plugin
identity support, [#26702](#26702)
added remote-section fetching and state, and
[#28768](#28768) extracted the
catalog rendering module. This PR builds the product-facing `/plugins`
catalog on that foundation so remote records appear as OpenAI Curated,
Workspace, and Shared with me sections rather than backend marketplace
implementation details.

Plugin details remain read-only for sharing metadata. This PR does not
add share-authoring actions or change the app-server protocol.

## Changes

- Renders OpenAI Curated, Workspace, and Shared with me sections with
loading, empty, and error states.
- Preserves section selection and stable tab ordering as remote sections
transition between fallback and populated states.
- Shows OpenAI Curated loading only when the explicit vertical fallback
request was issued.
- Centralizes remote marketplace identity matching around the existing
marketplace constants.
- Uses product labels for remote marketplaces and identifies the
personal marketplace as Local by its path.
- Shows read-only source, authentication, version, and sharing metadata
in plugin detail views.
- Applies narrow display deduplication for local and remote records
sharing a remote plugin ID:
  - installed records take precedence;
- local mapped sources are preferred for details only when their
installed state matches the selected record.
- Returns from detail and confirmation views through the current plugin
cache so newly loaded remote sections are not overwritten by an older
captured response.
- Keeps admin-disabled plugins view-only and labels default-installed
plugins as Available by default.

## Tests

New tests:

- `plugins_popup_admin_disabled_available_plugin_has_view_only_hint`
- `plugins_popup_remote_section_fallback_states_snapshot`
-
`plugins_popup_installed_remote_row_keeps_remote_detail_when_local_share_is_uninstalled`

Updated existing plugin catalog tests and snapshots for product labels,
detail metadata, personal-marketplace labeling, and stable tab ordering.

Verification:

- `cargo clippy -p codex-tui --all-targets -- -D warnings`

## Follow-ups

- Local/remote duplicate normalization should eventually move into
app-server. This PR intentionally keeps the compatibility behavior
narrow and display-only.
- PR5 will sanitize sensitive components before displaying Git source
URLs.
## Why

#29113 moved remote sandbox setup and enforcement to the exec server.
That gives the executor ownership of the platform-specific work: a Linux
executor chooses and runs a Linux sandbox even when the Codex
orchestrator is running on macOS or Windows.

It also means the orchestrator no longer knows which concrete sandbox
the executor selected. When that sandbox blocks a remote command, the
orchestrator currently sees only a failed process and can treat the
denial as an ordinary command failure. The existing sandbox approval and
retry path is then skipped.

This PR lets the executor report one portable fact:

> This command probably failed because the executor sandbox blocked it.

The executor keeps its concrete sandbox type private. The protocol sends
only the semantic result.

## Example

Suppose a local macOS Codex session asks a Linux devbox to write outside
the allowed workspace.

Before this PR:

```text
Linux sandbox blocks the write
    -> remote process exits with "Permission denied"
    -> local orchestrator sees an ordinary command failure
    -> the normal sandbox approval and retry path can be skipped
```

With this PR:

```text
Linux sandbox blocks the write
    -> executor reports sandboxDenied: true
    -> unified exec returns UnifiedExecError::SandboxDenied
    -> the existing approval prompt is shown
    -> an approved retry runs through the existing unsandboxed retry path
```

## What changes

### The executor remembers its selected sandbox

The prepared remote process now retains the executor-selected
`SandboxType`. This value never crosses the executor boundary.

Commands started without a sandbox retain `SandboxType::None` and are
never reported as sandbox denials.

### The executor uses the existing denial heuristic

The existing local denial heuristic moves from `codex-core` into the
shared `codex-sandboxing` crate.

When a sandboxed remote process exits, the executor:

1. waits the same short output grace period used by local unified exec;
2. reads the output currently available in the existing retained output
buffer;
3. runs the existing heuristic using the exit code and common denial
messages;
4. stores the yes/no result before publishing the process exit.

This deliberately matches the old local unified-exec behavior. It does
not add a new streaming classifier, another output buffer, or stronger
output-retention guarantees.

### The protocol reports a portable boolean

`process/read` gains `sandboxDenied`:

```json
{
  "exited": true,
  "exitCode": 1,
  "closed": false,
  "sandboxDenied": true
}
```

The field defaults to `false` when an older executor omits it. The
response does not expose the executor sandbox implementation or
executor-native paths.

### Unified exec uses the existing error path

The exec-server client carries `sandboxDenied` into the unified process
state. If it is true, unified exec returns the existing `SandboxDenied`
error instead of trying to classify remote output using an
orchestrator-side sandbox type.

Remote process exit remains visible as soon as the process exits. This
PR does not wait for stdout or stderr to close and does not change the
existing process lifecycle.

## Scope

This PR is intentionally limited to matching the existing local
unified-exec behavior for the initial command execution path.

It does not add:

- incremental denial tracking across the full output stream;
- new denial handling for commands completed later through
`write_stdin`;
- new guarantees for preserving the semantic flag during the narrow
reconnect-recovery race.

Those can be considered separately if the same behavior is added for
local execution.

## Test coverage

One remote end-to-end integration test covers the complete intended
flow:

```text
remote read-only sandbox
    -> denied write
    -> executor reports the denial
    -> Codex requests approval
    -> user approves
    -> retry succeeds on the remote executor
```

Existing lifecycle coverage continues to verify that remote process exit
is reported before late output streams close.
…28968)

## Description
This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
here: #28355) to
`ResponseItem.internal_chat_message_metadata_passthrough`, which is the
blessed path and has strongly-typed keys.

For now we have to drop this MAv2 usage of `metadata`:
#28561 until we figure out where
that should live.
## Summary

- use generated image data URLs in the Python SDK examples and notebook
- document HTTP and HTTPS image URLs as deprecated and recommend
`LocalImageInput`
- replace the remote-URL integration test with data-URL coverage

`ImageInput` remains available for data URLs. The SDK does not duplicate
app-server URL validation.

## Testing

- `uv run --frozen --no-sync ruff check --output-format=full .`
- `uv run --frozen --no-sync ruff format --check .`
- full Python SDK test suite with an isolated writable
`CODEX_SQLITE_HOME` (119 passed, 38 skipped)
## Why

The reset flow introduced in #28154 still describes earned reset credits
as "rate-limit resets" and uses generic reset-scope copy. It can also
retain a stale available-credit count after redemption or an account
change, leaving the reset action enabled after the last credit is used.

This follow-up updates terminology only within that reset feature.
Existing rate-limit wording elsewhere in the CLI and TUI is unchanged.

## What changed

- Rename reset-specific `/usage` menu items, startup hints, and reset
dialogs to "usage limit reset."
- Describe monthly resets for Free, Go, and accounts that report a
monthly usage window; otherwise describe the current 5-hour and weekly
limits.
- Recheck a cached zero balance when `/usage` is reopened, and refresh
the balance after redemption so the final reset immediately disables the
action.
- Correlate async refresh results before updating snapshots and clear
account-derived reset state, warnings, prompts, and status surfaces when
the account changes.

## Validation

- `just test -p codex-tui chatwidget::tests::usage` — 29 passed.
- `just test -p codex-tui chatwidget::tests::status_command_tests` — 7
passed.
- Account-boundary prompt and plan-mode prompt regression tests passed.
- `cargo insta pending-snapshots` from `codex-rs/tui` — no pending
snapshots.\

<img width="814" height="318" alt="image"
src="https://github.com/user-attachments/assets/2a460e96-458b-4805-8d9f-c759382d21a4"
/>
view for monthly
<img width="905" height="243" alt="image"
src="https://github.com/user-attachments/assets/179f88e3-08fb-4af5-8dc6-ce6a944ed681"
/>
…ed (#27982)

## Why

The first auto-review currently creates its Guardian child session on
demand, adding avoidable latency before the review can begin. Creating
the ordinary Guardian child during parent-session initialization lets
that child use the existing session startup WebSocket prewarm before the
first escalation. This does not introduce a Guardian-specific prewarm
mechanism.

## What changed

- initialize the existing Guardian review-session manager owned by
`Session` when a thread starts with auto-review enabled and an approval
policy that routes to Guardian
- use the standard Guardian child-session construction and the existing
session startup WebSocket prewarm
- preserve the existing reuse-key invalidation and lazy creation
fallback when startup initialization fails or the effective review
configuration changes
- add an integration test that verifies normal root-session startup
emits a Guardian `generate=false` prewarm request

## Benchmark

I compared release builds against main. Each prompt first ran a
non-escalated `sleep 3`, then requested an escalated marker command.

| binary | count | avg Guardian duration | median Guardian duration |
avg Guardian TTFT |
|---|---:|---:|---:|---:|
| origin-main | 10 | 4008.7 ms | 3949.5 ms | 3746.5 ms |
| session-fix | 10 | 2865.0 ms | 2594.0 ms | 2492.7 ms |

Guardian duration fell by 28.5% and Guardian TTFT fell by 33.5%. These
measurements cover Guardian review latency; they do not measure parent
thread-start latency.
## Why

`compile_scoped_filesystem_pattern()` accepted a `_policy_cwd` parameter
even though scoped glob compilation no longer uses the policy working
directory. Keeping that unused argument forced the surrounding
permissions compilation path to keep forwarding `policy_cwd` through
call sites that did not need it, making the API look more dependent on
cwd resolution than it is.

## What changed

Removed the unused cwd parameter from
`compile_scoped_filesystem_pattern()` and the callers that only
forwarded it: `compile_filesystem_permission()`,
`compile_permission_profile()`, and
`compile_permission_profile_selection()`. Workspace root resolution
still keeps `policy_cwd`, because that path still resolves relative
roots against the active policy cwd.

Relevant code:
[`codex-rs/core/src/config/permissions.rs`](https://github.com/openai/codex/blob/b8b9816102e064dae4488ec130cf560f63c1ab78/codex-rs/core/src/config/permissions.rs#L346).

## Verification

- `just test -p codex-core config::permissions`
- `just test -p codex-core` was also run after building
`test_stdio_server`; it passed the touched permissions coverage but
still reported unrelated existing failures in `cli_stream` and shell
snapshot tests.
## Summary

Stacked on #26706.

Adds the shared auth/system-proxy contract that later platform resolver
PRs plug into. This PR moves Codex-owned auth and startup HTTP clients
through a common route-aware boundary, but does not yet add Windows or
macOS system proxy resolution.

The default path remains unchanged when `respect_system_proxy` is absent
or disabled.

## Implementation

- Adds `codex-client/src/outbound_proxy.rs` with the shared
route-selection model:
  - `OutboundProxyConfig`;
  - `ClientRouteClass`;
  - `RouteFailureClass`;
  - `build_reqwest_client_for_route`.
- Preserves the existing reqwest/default-client behavior when no route
config is supplied.
- Uses the fixed MVP routing policy when route config is supplied:
platform system/PAC/WPAD discovery, then explicit env proxy variables,
then direct connection.
- Keeps platform-specific system discovery behind the shared client
boundary. This PR provides the contract and fallback behavior; later
resolver PRs plug in Windows and macOS discovery.
- Adds `login::AuthRouteConfig` so auth call sites depend on a small
policy type instead of platform resolver details.
- Maps the resolved `Config.respect_system_proxy` boolean into
`AuthRouteConfig` for auth-owned clients.
- Wires the route config through browser login, device-code login,
access-token login, login status, logout/revoke, token refresh, API-key
exchange, app-server account login, TUI/app startup, cloud-config
bootstrap, cloud tasks, plugin auth, and exec startup config loading.

## End-user behavior

- No behavior changes by default.
- When `respect_system_proxy = true`, auth-owned clients opt into the
shared route-aware client path.
- On platforms without a resolver implementation in this PR, system
discovery is unavailable and the route-aware path falls back to explicit
env proxy handling, then direct connection.
- Custom CA handling remains separate from proxy route selection and
still runs through the shared client builder.
- No proxy URLs, PAC contents, or resolved platform details are exposed
through the public config surface introduced here.

## Tests

Adds or updates coverage for:

- preserving default auth-client fallback behavior when no route config
is provided;
- injected environment-proxy fallback without mutating process
environment;
- existing login-server E2E flows using explicit `auth_route_config:
None` to guard unchanged default behavior;
- updated auth manager, login, logout, cloud-config, startup, and
plugin-auth call sites passing route config explicitly.
# Summary

Codex required every ChatGPT account to have an email address. A
service-account personal access token can return valid account metadata
without one, so PAT login failed while decoding the metadata response.

This change makes email optional in the account metadata type that owns
it and preserves that absence through authentication, provider account
state, the app-server API, generated clients, and TUI bootstrap.
Existing accounts with email addresses keep the same behavior.

## Behavior-changing call sites

| Call site | Behavior after this change |
| --- | --- |
| `login/src/auth/personal_access_token.rs` | PAT metadata accepts a
missing or null email and retains `None`. |
| `agent-identity/src/lib.rs` | Agent Identity JWT claims accept an
omitted email. |
| `login/src/auth/storage.rs` and `login/src/auth/agent_identity.rs` |
Stored and managed Agent Identity records carry `Option<String>`.
Deserialization maps the legacy empty-string sentinel to `None`. |
| `login/src/auth/manager.rs` | `get_account_email` returns the stored
option, and managed identity bootstrap no longer converts `None` to an
empty string. |
| `model-provider/src/provider.rs` and `protocol/src/account.rs` | A
ChatGPT provider account requires a plan type but may carry no email. |
| `app-server-protocol/src/protocol/v2/account.rs` | `account/read`
keeps the `email` field on the wire and returns `null` when the account
has no email. Generated TypeScript and JSON schemas describe a required,
nullable field. |
| `sdk/python/src/openai_codex/generated/v2_all.py` | The generated
Python `ChatgptAccount` model accepts `None` for email. |
| `tui/src/app_server_session.rs` | Email-less ChatGPT accounts
bootstrap normally, keep external feedback routing, omit account-email
telemetry, and display the plan in account status. |

## Design decisions

- Missing email remains `None` at every layer. The code never uses an
empty string as a substitute.
- The app-server response includes `"email": null` instead of omitting
the field. Clients retain a stable response shape.
- Plan type remains required for provider account state. This change
relaxes only the email assumption.

## Testing

Tests: affected test targets compile, scoped Clippy and formatting pass,
a focused TUI snapshot covers plan-only account status, real
before/after PAT login smoke covers metadata without email, app-server
smoke covers `account/read` with `email: null`, and a regression smoke
covers an existing email-bearing PAT. Unit tests run in CI.

## Evidence

Visual smoke evidence will be attached here.
## Summary

Instead of:

    reminder_interval_tokens = 65_536

allow users to configure explicit remaining-token reminder thresholds:

reminder_at_remaining_tokens = [65_536, 32_768, 16_384, 8_192, 4_096,
2_048, 1_024, 512]

## Validation

- CARGO_INCREMENTAL=0 just test -p codex-core rollout_budget: 9 passed
- just fix -p codex-core
- just fmt
## Why

`permissionProfile/list` currently advertises every built-in and
configured profile even when effective enterprise requirements prevent
selecting it. That forces each client to reconstruct policy from
lower-level requirement fields, which is easy to miss and difficult to
keep consistent.

The catalog should remain complete so clients can explain that an option
was disabled by an administrator, while also reporting whether each
profile is selectable.

## What

- Add an `allowed` field to each permission profile summary.
- Build a shared catalog from the effective config and current
requirements, including `allowed_sandbox_modes`, `allowed_permissions`,
and filesystem restrictions.
- Use the shared catalog in app-server and the TUI so disallowed
profiles remain visible but cannot be selected.
- Use the canonical `:danger-full-access` profile ID in the TUI.
- Update the app-server schemas, API documentation, behavioral tests,
and TUI snapshots.

## Scope

This PR targets `main` directly and is independent of #24852. It
preserves the current behavior where built-in profiles are constrained
by sandbox-mode requirements and `allowed_permissions` applies to
configured profiles.

## Testing

- `just test -p codex-core
permission_profile_catalog_marks_profiles_disallowed_by_requirements`
- `just test -p codex-app-server permission_profile_list`
- `just test -p codex-app-server-protocol`
- `just test -p codex-tui profile_permissions`
- `just fix -p codex-core`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
- `just fmt`

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Joey Trasatti <joey.trasatti@openai.com>
## Why

`codex-app-server-test-client` previously treated
`item/tool/requestUserInput` as an unsupported server request and
terminated the connection. That made it impossible to use the client for
end-to-end testing of interactive turns: an operator could observe the
request, but could not answer it and confirm that the same turn resumed.

## What changed

- Handle `ToolRequestUserInput` server requests in the test client's
central request dispatcher.
- Render numbered terminal choices, accept exact option labels, support
free-form `Other` and text-only questions, and collect multiple answers.
- Send a protocol-native `ToolRequestUserInputResponse` and continue
streaming the active turn.
- Fail clearly when interactive input is requested without a terminal.
- Document the interactive behavior and add focused tests for option
selection, free-form answers, multiple questions, and invalid-selection
retries.

## Testing

- `just test -p codex-app-server-test-client`
- `just bazel-lock-check`
- Manually exercised the app-server flow, selected `TUI`, observed
`serverRequest/resolved`, and verified that the same turn completed with
the selected answer.
## Summary

- rename `Config::permission_profile_allowed` to
`is_permission_profile_allowed`
- use `BUILT_IN_PERMISSION_PROFILE_DANGER_FULL_ACCESS` in the TUI and
its assertion
- follow up on the late review comments from #26678

The previous `:danger-no-sandbox` value was an invalid built-in profile
ID. #26678 corrected it to `:danger-full-access`; this PR centralizes
the value to prevent future drift.

## Testing

- Not run per request; `cargo fmt` only

Co-authored-by: Codex <noreply@openai.com>
## Why

When Codex starts with a custom CA override such as
`SSL_CERT_FILE=/path/to/corp-ca.pem codex`, `rustls-native-certs` treats
that override as a replacement for the platform trust store. The managed
proxy then rewrites child CA variables to its generated bundle, so the
custom root or the ordinary platform roots can be lost. The proxy's
upstream TLS connector must trust the same roots or private and
corporate upstream certificates still fail after interception.

## What

- load platform-native roots without consulting inherited CA override
variables
- append certificates from the existing curated startup CA file
variables and `SSL_CERT_DIR`
- share those platform and startup roots with the MITM upstream rustls
connector
- exclude the Codex managed MITM CA from upstream trust
- normalize OpenSSL `TRUSTED CERTIFICATE` blocks while dropping trailing
trust metadata
- skip an inherited current Codex-managed bundle so nested launches do
not duplicate it
- append the Codex managed MITM CA to the child-facing bundle
- copy certificate material only, so a private key or unrelated text
colocated in a startup file is never exposed through the public bundle

This is intentionally limited to CA paths present when Codex starts. It
does not parse inline shell assignments or add per-command bundle
materialization.

This changes only `codex-network-proxy` and dependency metadata; it does
not touch `codex-core` or sandbox orchestration.

## Validation

- `just test -p codex-network-proxy`
- includes an end-to-end upstream TLS test using a server trusted only
by the startup custom CA
- `just fix -p codex-network-proxy`
- `just bazel-lock-check`
## Why

`openai-oss-forks/tokio-tungstenite` now includes the updated
`tungstenite` fork revision from
[openai-oss-forks/tokio-tungstenite#3](openai-oss-forks/tokio-tungstenite#3).
Codex should consume the merged fork commit and resolve its direct and
transitive `tungstenite` dependencies to the same revision instead of
retaining the older pins.

## What Changed

- Advanced the `tokio-tungstenite` git pin to
`0e5b2d73aa18dd9f0a50ee9ff199d5aef7594186`.
- Advanced the `tungstenite` fork pin to
`4fffad30fe373adbdcffab9545e9e9bf4f2fc19f` and adjusted the patch source
so the transitive dependency resolves to that revision.
- Updated `Cargo.lock` and `MODULE.bazel.lock` to match the dependency
graph.
## This PR

Remote plugin analytics cannot rely only on the in-memory
installed-plugin snapshot because that snapshot is refreshed
asynchronously after startup. This PR persists the authoritative backend
identity alongside each cached remote plugin bundle so later consumers
can resolve it without a network request.

### Behavior

- Store Codex-owned remote installation metadata in an atomic
`.codex-remote-plugin-install.json` sidecar under the plugin cache root.
- Use a versioned, snake_case schema:

  ```json
  {
    "schema_version": 1,
    "remote_plugin_id": "plugins~Plugin_..."
  }
  ```

- Write the metadata during remote bundle installation.
- Backfill it when bundle sync finds an already-current cached bundle.
- Clear it when a generic/local install replaces the cache.
- Let existing uninstall and stale-cache removal delete it with the
plugin cache root.
- Reject unsupported schema versions rather than silently misreading
future formats.

This PR does not change analytics serialization or event behavior.

### Review surface

The implementation is limited to four `codex-core-plugins` files:

- `store.rs`: owns the versioned sidecar read/write/remove lifecycle.
- `remote_bundle.rs`: persists the backend ID after a remote bundle
install.
- `remote/remote_installed_plugin_sync.rs`: backfills metadata for an
already-current cached bundle.
- Tests cover the storage lifecycle and both remote write paths.

## Testing / Validation

### Automated

- `just test -p codex-core-plugins` (268 tests passed)
- `just fix -p codex-core-plugins` passes with one pre-existing
`large_enum_variant` warning in `manifest.rs`.
- Coverage verifies the exact filename and JSON schema, identity
replacement, local reinstall clearing, uninstall cleanup, remote bundle
installation, unsupported schema rejection, and installed-plugin sync
backfill.

### Live manual validation

Validated the production app-server RPC path with an isolated temporary
`CODEX_HOME` and the PR-built Codex binary. The app-server communicated
over stdio and did not bind a port.

Test plugin: `plugins~Plugin_b80dd84519148191a409cde181c9b3d6`
(`build-macos-apps@openai-curated-remote`).

1. Confirmed `plugin/read` initially reported the plugin uninstalled.
2. Installed it through `plugin/install` and confirmed version `0.1.4`
was cached.
3. Verified
`$CODEX_HOME/plugins/cache/openai-curated-remote/build-macos-apps/.codex-remote-plugin-install.json`
was created beside the `0.1.4/` bundle directory with mode `0600` and
the expected contents:

   ```json
   {
     "schema_version": 1,
"remote_plugin_id": "plugins~Plugin_b80dd84519148191a409cde181c9b3d6"
   }
   ```

4. Deleted only the sidecar, restarted the app-server, and confirmed
installed-plugin startup sync recreated it with the same contents.
5. Uninstalled through `plugin/uninstall`, confirmed `plugin/read`
returned `installed: false`, and verified the local plugin cache root
was removed.
6. Restored the account's original uninstalled state and removed the
isolated home and copied credentials.

## Split Overview

```text
main
├── #27093  Debug analytics capture                     merged
│   └── #27099  Non-mutating plugin smoke               merged
│       └── #27100  Remote install/uninstall smoke      merged
└── #27102  Plugin telemetry metadata refactor          merged
    └── #27669  Persist remote plugin identity           ← this PR

Next:
└── Final PR: add explicit local and remote IDs to plugin analytics
```

This PR is based directly on `main`; prerequisite
[#27102](#27102) has merged. The
original combined [#26281](#26281)
remains the aggregate reference until the final replacement PR is
published.
## Summary

Stacked on #26707.

Adds the Windows implementation of the shared system-proxy contract.
This allows Codex-owned auth clients to use the route Windows selects
for each auth URL, including explicit PAC configuration, WPAD
auto-detection, static proxies, and bypass rules.

The `respect_system_proxy` feature is disabled by default, so existing
client behavior remains unchanged unless explicitly enabled.

## Implementation

- Adds Windows-only `codex-client` dependencies:
- `windows-sys` with `Win32_Foundation` and `Win32_Networking_WinHttp`;
  - `sha2` for redacted cache keys.
- Dispatches system-proxy resolution to `outbound_proxy/windows.rs` on
Windows.
- Reads the current-user WinHTTP/IE proxy configuration via
`WinHttpGetIEProxyConfigForCurrentUser`.
- Resolves explicit PAC URLs first, then OS-enabled WPAD auto-detection,
then static proxy and bypass settings.
- Uses `WinHttpGetProxyForUrl` for PAC/WPAD and maps results into the
shared `SystemProxyDecision::{Direct, Proxy, Unavailable}` contract.
- Parses `DIRECT`, `PROXY`, `HTTPS`, and keyed static proxy entries.
- Treats unsupported schemes such as SOCKS as unavailable so the shared
resolver can apply its environment-proxy fallback.
- Handles Windows bypass entries, including `<local>` and host, suffix,
wildcard, and port matching.
- Releases WinHTTP-owned strings with `GlobalFree` and closes sessions
with `WinHttpCloseHandle`.
- Hashes URL-specific cache keys with SHA-256 so PAC decisions remain
URL-specific without retaining raw request URLs or query strings.

## End-user behavior

- Disabled/default: existing client behavior is unchanged.
- Enabled with `[features.respect_system_proxy]`:
- Windows auth clients honor explicit PAC configuration, OS-enabled
WPAD, static proxies, and bypass rules;
  - valid OS/PAC `DIRECT` decisions use a direct connection;
- unavailable system resolution falls back to explicit environment proxy
variables, then `DIRECT`, through the shared contract from #26707.
- Unsupported proxy schemes are not silently translated into a different
route.
- Custom CA handling remains separate from proxy selection.

## Tests

Adds coverage for:

- PAC-style proxy tokens such as `PROXY proxy.internal:8080` and `HTTPS
proxy.internal:8443`;
- static WinHTTP proxy entries keyed by target scheme;
- `DIRECT` and unsupported proxy-token behavior;
- Windows bypass matching, including `<local>`, wildcard, suffix, and
port-qualified entries;
- preserving URL-specific PAC cache decisions without retaining the raw
URL on Windows.
register cdp requirements feature flag
## Summary

- fetch featured plugin IDs when the loaded catalog includes
`openai-curated-remote`
- extend the existing remote marketplace regression test to cover the
featured IDs response

## Why

When the remote plugin catalog was enabled, app-server loaded
`openai-curated-remote` but skipped `/plugins/featured` because the
request processor only fetched featured IDs for the local
`openai-curated` marketplace. As a result, the desktop app could not
render the backend-curated remote featured set.

This keeps the existing local behavior and also returns the curated
ranking for remote plugins.

## Test plan

- `just fmt`
- `git diff --check`
- `just test -p codex-app-server
plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled`
## Summary

- upgrade the bundled OpenSSL source from 3.5.5 to 3.6.3
- update the Bazel `openssl-sys` build dependency to use the upgraded
source crate
- refresh the Bazel module lockfile

## Why

OpenSSL 3.5.5 is within the affected ranges for security issues fixed in
later releases. The Rust `openssl-src` wrapper does not currently
publish OpenSSL 3.5.7, so this moves the vendored Linux musl build to
the available patched 3.6.3 release.
## Why

The TypeScript workspace resolved `esbuild` 0.25.10 transitively through
the SDK toolchain. `esbuild` 0.28.1 adds integrity verification to the
Deno binary download path addressed by
[GHSA-gv7w-rqvm-qjhr](GHSA-gv7w-rqvm-qjhr),
preventing an attacker-controlled npm registry from supplying an
executable without a content check.

## What changed

- Add a root workspace resolution for `esbuild` 0.28.1.
- Regenerate `pnpm-lock.yaml` so `tsup`, `bundle-require`, and `ts-jest`
all resolve the patched version.

## Validation

- Frozen pnpm install, including the SDK's `tsup` build
- `pnpm --filter @openai/codex-sdk exec jest tests/exec.test.ts
--runInBand`
- Confirmed the installed dependency graph contains only `esbuild`
0.28.1
Adds additive dark-mode plugin logo metadata across manifests, remote
catalogs, and the app-server protocol while keeping uninstalled Git
listings free of synthetic local paths.

Supersedes #28945. This replacement uses an upstream branch so trusted
CI can use the repository-provided remote Bazel configuration.

## Current state

Plugin interfaces expose only the default logo asset. Clients therefore
cannot select a dedicated dark-mode logo even when a plugin provides
one.

## What this PR changes

- Adds nullable `logoDark` and `logoUrlDark` fields to
`PluginInterface`.
- Resolves local `interface.logoDark` assets and maps remote
`logo_url_dark` values.
- Removes path-backed interface assets, including `logoDark`, from
uninstalled Git fallback listings until the plugin has a real local
root.
- Updates the bundled plugin validator and manifest reference.
- Regenerates the app-server JSON schemas and TypeScript types.

Local manifests expose `interface.logoDark` as a package-relative asset
path. Remote catalog responses expose `logo_url_dark`. These values map
into separate app-server fields so clients can preserve local-path and
remote-URL handling.

## Risk

The fields are additive and nullable, so existing clients retain their
current logo behavior. The main risks are an incomplete mapping path or
exposing a synthetic local path for an uninstalled Git plugin.
Local-manifest, remote-catalog, fallback-listing, protocol
serialization, and app-server integration tests cover those paths.

Spiciness: 2/5

## Testing

- `just write-app-server-schema`
- `just fmt`
- Regression test first failed with `logo_dark` resolved to
`/assets/logo-dark.png`, then passed after the fallback-listing fix.
- `just test -p codex-core-plugins` (267 tests passed)
- `just test -p codex-app-server 'suite::v2::plugin'` (114 tests passed)
- `just test -p codex-app-server-protocol -p codex-core-plugins -p
codex-plugin -p codex-skills` (517 tests passed before the follow-up)
- `just test -p codex-tui plugin` (47 tests passed)
- Validated a local plugin manifest containing `interface.logoDark` with
the bundled validator.

## Manual verification

Create a local plugin with both `interface.logo` and
`interface.logoDark`, then call `plugin/list` or `plugin/read`. Confirm
the response contains separate `logo` and `logoDark` paths. For a remote
catalog entry, confirm `logoUrlDark` is populated from `logo_url_dark`.
For an uninstalled Git marketplace entry, confirm path-backed interface
assets remain absent until installation.

Issue: N/A - coordinated maintainer change.
owenlin0 and others added 30 commits June 26, 2026 12:32
## Description

This PR makes `thread.history_mode` immutable after the thread's
canonical first `SessionMeta` has been written. Later same-thread
`SessionMeta` lines are compatibility metadata writes, not a new thread
definition.

Without this, an older binary could append a `SessionMeta` that omits
`history_mode`; when a newer binary replays it, serde defaults that
missing field to `legacy` and SQLite could downgrade a paginated thread.

## Why

`history_mode` is the persisted thread storage contract.
Paginated-thread fail-closed behavior and SQLite memory filtering depend
on it staying aligned with canonical rollout metadata, especially when
multiple Codex binary versions can touch the same local rollout.

## What changed

- Stop generic rollout metadata replay from overwriting `history_mode`
from later `SessionMeta` items.
- Remove `history_mode` from `ThreadMetadataPatch`, so mutable metadata
sync and app-server metadata updates cannot rewrite it.
- When local metadata sync has to recreate a missing SQLite row, recover
`history_mode` from the rollout's canonical first `SessionMeta` instead
of from a mutable patch.
- Keep the in-memory thread store using the created thread's canonical
`history_mode` instead of metadata patches.
- Fill the one remaining core test `CreateThreadParams` initializer with
the new `history_mode` field; Bazel CI caught this after the parent
history-mode PR landed.

## Validation

- `just fmt`
- `just test -p codex-thread-store`
- `just test -p codex-state
session_meta_does_not_set_model_or_reasoning_effort`
## Description

This adds stable optional `turnId` support to `thread/fork`. When
supplied, the fork copies persisted history through that terminal turn,
inclusive, and drops later turns from the new thread.

Omitting or passing `null` preserves the existing full-history fork
behavior, including the interruption marker when the stored source
history ends mid-turn.

## Why

We're deprecating `thread/rollback` and this will help certain UX use
cases work around it by using `thread/fork` + `turn_id` instead.
## Why

I use the `$code-review` skill a lot and it'd be nice to add my own
additional review criteria in `$CODEX_HOME/skills/code-review-*`.

## What

Removes phrasing about "code-review-* skills in this repository" which
in practice seems like enough to get Codex to consult my user-level code
review skills in addition to the repo-level ones.
## Summary

- add Sol (`openai.gpt-5.6-sol`), Terra (`openai.gpt-5.6-terra`), and
Luna (`openai.gpt-5.6-luna`) to the Amazon Bedrock static model catalog
- derive all three entries from the bundled GPT-5.5 metadata and add the
Bedrock-only `max` reasoning effort
- keep the new entries below the current GPT-5.5 and GPT-5.4 models at
priorities 2, 3, and 4, preserving GPT-5.5 as the default
- add deep-equality coverage for inherited model configuration, catalog
ordering, context windows, and service-tier behavior
### Summary

Release live thread persistence when a session ends because its
submission channel closes. This prevents a later same-process resume
from failing with `thread ... already has a live local writer`.

### Details

The issue is in the `codex-core` session teardown path used by Codex
hosts, rather than in Managed Agents API or exec-server itself.

Explicit shutdown already closes the `LiveThread`, which releases the
process-scoped writer held by `LocalThreadStore`. The
submission-channel-close fallback ran runtime and extension teardown but
skipped that persistence shutdown, leaving the thread ID registered as
having a live writer.

This change:

- closes the `LiveThread` on the channel-close fallback path;
- preserves the existing teardown order used by explicit shutdowns;
- extends the lifecycle regression test to assert that the thread store
receives `shutdown_thread`.

Context: [original
report](https://openai.slack.com/archives/C0B4NBHQGTV/p1782136364948039),
[recent occurrence
1](https://openai.slack.com/archives/C0B4NBHQGTV/p1782434817895839?thread_ts=1782136364.948039&cid=C0B4NBHQGTV),
[recent occurrence
2](https://openai.slack.com/archives/C0B4NBHQGTV/p1782335107474429?thread_ts=1782136364.948039&cid=C0B4NBHQGTV)

### Testing

- `just test -p codex-core
submission_loop_channel_close_runs_full_thread_teardown`
- `just test -p codex-core --lib` (1,989 passed; 3 skipped)
- `just fix -p codex-core`
- `just fmt`
- Native code review: no findings

I also attempted `just test -p codex-core`. The new regression passed;
79 unrelated integration tests failed in the local harness, primarily
because helper binaries such as `test_stdio_server` were unavailable,
plus local proxy/shell timing failures.
## Summary

- classify authentication-required RMCP startup failures, including
errors nested inside `ClientInitializeError::TransportError`
- let `codex-mcp` consume that classification so the existing
`reauthenticationRequired` startup failure reason is emitted
- add a regression test that performs real startup with an expired
persisted OAuth token and no refresh token

## Why

Follow-up to #29877.

RMCP stores streamable HTTP initialization failures inside a dynamic
transport error whose payload is not exposed through the standard Rust
error source chain. The original `anyhow::Error::chain()` check
therefore missed the nested `AuthError::AuthorizationRequired` seen
during real MCP startup and emitted `failureReason: null`.

The transport-specific inspection now lives in `codex-rmcp-client`,
while `codex-mcp` consumes only the domain-level authentication-required
result. This classifier does not distinguish first-time login from
reauthentication; the existing auth-state logic remains responsible for
that distinction.

## User impact

When stored MCP OAuth credentials are expired and cannot be refreshed,
app clients now receive `failureReason: "reauthenticationRequired"` on
the failed startup update and can show the reconnect action. First-time
login and unrelated startup failures remain unchanged.

## Validation

- `just test -p codex-rmcp-client --test streamable_http_oauth_startup
identifies_expired_unrefreshable_token_startup_error`
- `just test -p codex-mcp
startup_outcome_error_identifies_authentication_required`
- `just test -p codex-mcp
mcp_startup_failure_reason_requires_existing_oauth_and_auth_failure`
- `cargo build -p codex-cli --bin codex`
- local app-server probe emitted `failureReason:
"reauthenticationRequired"`
- manual end-to-end reconnect flow confirmed
- `just fmt`
## Why

Marketplace source deserialization treated `{"source":"npm", ...}` as
unsupported. The loader logged and skipped the entry, so npm-backed
plugins never appeared in `plugin list --available` and `plugin add`
returned "plugin not found".

Codex plugins are installed from a plugin root, not from an npm
dependency tree. For npm-backed marketplace entries, Codex should fetch
the published package contents without running package scripts or
installing unrelated dependencies.

## What changed

- Add `npm` marketplace plugin sources with `package`, optional semver
`version` or version range, and optional HTTPS `registry`.
- Reject unsafe npm source fields before materialization, including
invalid package names, non-semver version selectors, plaintext or
credential-bearing registry URLs, and registry query/fragment data.
- Materialize npm plugins with `npm pack --ignore-scripts`, then unpack
the resulting tarball through the existing hardened plugin bundle
extractor.
- Enforce npm archive and extracted-size limits, require the standard
npm `package/` archive root, and verify the extracted `package.json`
name matches the requested package before installing.
- Keep plugin listings, install-source descriptions, CLI JSON/human
output, app-server v2 `PluginSource`, TUI source summaries, regenerated
schema fixtures, and app-server documentation in sync.

## Impact

Marketplaces can distribute Codex plugins from public or configured
private HTTPS npm registries using the same install flow as existing
materialized plugin sources. `npm` must be available on `PATH` when an
npm-backed plugin is installed.

Fixes #27831

## Validation

- `just write-app-server-schema`
- `just test -p codex-core-plugins -p codex-app-server-protocol -p
codex-app-server -p codex-cli`
  - npm/schema/core-plugin coverage passed in the run.
- The full focused command finished with `1739 passed`, `11 failed`, and
`6 timed out`; the failures were unrelated local app-server environment
failures from `sandbox-exec: sandbox_apply: Operation not permitted`
plus one missing `test_stdio_server` helper binary.
- Installed an npm-published Codex plugin package through a throwaway
local marketplace and throwaway `CODEX_HOME` to exercise the real npm
materialization path end to end.
## Why

It's hard to change the set of required jobs when they're managed in the
GitHub UI, and when each workflow is responsible for choosing it's own
scheduling it's easy to end up with skew between what we enforce on PRs
vs. on main.

## What

- add a `blocking-ci` caller workflow, triggered by pull requests and
pushes to `main`, for Bazel, blob size, cargo-deny, Codespell,
`repo-checks`, rust CI, and SDK CI
- add an `always()` terminal job named `CI required` that fails unless
every called workflow succeeds
- add a `postmerge-ci` caller workflow for `rust-ci-full` and
`v8-canary`, with a terminal `Postmerge CI results` job
- centralize V8 relevance detection in `v8_canary_changes.py`; unrelated
PR and postmerge runs execute metadata only and skip the expensive build
matrices
- leave `v8-canary` outside the blocking gate and leave the external
`cla` check independent

## Rollout

A repository admin must replace the existing required GitHub Actions
contexts with `CI required` in the main-branch ruleset. Retain `cla` as
a separate required check. Until that change is coordinated, this PR
cannot satisfy the old standalone check names. In-flight PRs will need
to be rebased after this lands.
## Description

This PR adds canonical core `TurnItem` shapes for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity, to
be stored in the rollout file soon.

It also teaches app-server protocol / `ThreadHistoryBuilder` how to
render those items, and adds the small legacy fanout helpers needed for
existing event-based consumers. No core producer or rollout persistence
behavior changes here, that will be done in a followup.

## Making ThreadHistoryBuilder stateless

This is the first PR in a stack to make `ThreadHistoryBuilder` stateless
enough that we can materialize app-server `ThreadItem`s from only a
given slice of `RolloutItem` history, without ever needing to replay the
whole thread from the beginning.

The persisted legacy `RolloutItem::EventMsg` records are mostly shaped
like live UI events, not like materialized `ThreadItem`s. They work if
we replay the full rollout in order, but they often do not contain
enough stable identity or complete item state to project an arbitrary
suffix on its own.

A few examples:

- `UserMessageEvent` and `AgentMessageEvent` have content, but
historically do not carry the persisted app-server item ID that should
become the SQLite primary key.
- `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are
fragments. `ThreadHistoryBuilder` currently merges them into the last
reasoning item, which means a slice starting in the middle of reasoning
cannot know whether to append to an earlier item or create a new one.
- `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and
similar legacy events can often render a final-looking item, but they
usually rely on prior replay state to know which turn owns the item.
- Begin/end legacy events are partial views of one logical item. The
builder correlates them by `call_id` and mutates prior state to
synthesize the final `ThreadItem`.

That is the problem this direction fixes. A persisted canonical
lifecycle record looks much closer to the read model we actually want
later:

```rust
ItemCompletedEvent {
    turn_id,
    item: TurnItem { id, ...full snapshot... },
    completed_at_ms,
}
```

Once rollout has explicit `turn_id`, stable `item.id`, and a canonical
completed item snapshot, the future SQLite projector can reduce only the
new rollout suffix and upsert the affected `thread_items` rows. It no
longer needs to synthesize `item-N`, infer item ownership from the
active turn, or replay earlier events just to reconstruct the current
item snapshot.

## What changed

- Added core `TurnItem` variants and item structs for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity.
- Added conversions from those canonical items back into the legacy
event shapes where current consumers still need them.
- Added app-server v2 `ThreadItem` conversion for the new core item
variants.
- Taught `ThreadHistoryBuilder` and rollout persistence metrics to
recognize the new item variants.

## Follow-up

The next PR #30283 switches the live
core producers for these item families onto canonical `ItemStarted` /
`ItemCompleted` events.
## Why

Remote-control websocket reconnects and pairing requests proactively
refresh their server token. When `/server/refresh` returns a transient
error such as `502`, the still-valid token was discarded as a usable
connection path, causing reconnect failures and repeated refresh
attempts that could amplify an upstream incident.

## What Changed

- Start proactive refresh five minutes before token expiry and
distinguish it from a required refresh for missing or expired tokens.
- Continue websocket and pairing operations with the existing valid
token after `429`, `5xx`, or timeout failures.
- Share an in-memory `next_refresh_at` throttle across websocket and
pairing callers, honoring both `Retry-After` formats and otherwise using
a jittered 24–36 second delay.
- Keep required refreshes strict, preserve `404` enrollment replacement,
and clear token/throttle state for `401` and `403` auth recovery.
- Preserve refresh response metadata internally and add focused
wire-level and integration coverage.

## Verification

Added behavioral coverage proving that:

- a valid near-expiry token still completes websocket and pairing
requests after transient refresh failures;
- `Retry-After` suppresses a subsequent refresh across websocket and
pairing callers;
- request and response-body timeouts are classified as transient;
- an expired token, including one that expires during refresh, cannot
proceed to websocket connection;
- auth failures clear the attempted token without overwriting a
concurrently rotated token.
## Summary

- complete unified-exec processes from the ordered event stream instead
of issuing a final zero-wait `process/read`
- add optional executor sandbox-denial state to `process/exited`
- retain `process/read` as a retained-output and compatibility fallback
for receiver lag, sequence gaps, and legacy servers
- recover sandbox-denial state across transport reconnection
- cover the real `TestCodex` remote-exec path without adding a public
test-only event constructor

## Why

A successful one-shot tool call currently receives its output and
terminal notifications, then pays another wide-area `process/read` round
trip before returning. Staging traces showed that remote response wait
accounted for more than 99.8% of RPC time; local serialization,
queueing, and deserialization were below 0.6 ms.

## Measured impact

A direct staging A/B used the same build and route and changed only
completion mode. Each arm ran three times with 30 one-shot
`/usr/bin/true` calls per run. The table reports the median of the three
per-run percentiles.

| Metric | Final `process/read` | Pushed events | Change |
| --- | ---: | ---: | ---: |
| End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
| End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
| Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
| Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |

TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
successful, complete, in-order event path issued zero final
`process/read` calls.

## Compatibility and recovery

- new servers send `sandboxDenied` on `process/exited`
- legacy servers omit it, which triggers one compatibility
`process/read`
- broadcast lag or a sequence gap triggers a retained-output read
- recovery remains bounded by the server's existing 1 MiB
retained-output window
- complete, in-order event streams issue no completion read
- sandbox denial is attached to the exit event before consumers can
observe process completion
- server-first and client-first rollouts remain wire-compatible;
server-first realizes the latency win immediately

## Integration coverage

The `TestCodex` suite exercises four distinct remote-exec contracts:

- complete pushed output/exit/close with zero reads
- direct pushed sandbox denial with zero reads
- legacy missing denial metadata with exactly one compatibility read
- count-bounded replay eviction recovered from retained output without
duplication

## Validation

- `just test -p codex-core
exec_command_consumes_pushed_remote_process_events`: 4 passed
- `just test -p codex-core unified_exec::process_tests::`: 4 passed
- `just test -p codex-exec-server`: 294 passed, 2 skipped
- `just test -p codex-exec-server-protocol`: 5 passed
- `just test -p codex-rmcp-client`: 89 passed, 2 skipped
- focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
- scoped `just fix` passed for core and exec-server
- `just fmt` passed

The complete workspace suite was not rerun; focused Cargo and Bazel
coverage passed for the changed behavior.
## Why

Remote diff-root discovery is independent of world-state construction,
but it ran afterward and added filesystem metadata latency before the
first model request. Overlap the independent work so thread-cold turns
do not pay those waits serially.

## What

- Run `record_context_updates_and_set_reference_context_item` and
`turn_diff_display_roots` with `tokio::join!`.
- Reuse the same resolved display roots when constructing
`TurnDiffTracker`; no cache or behavior lifecycle changes are
introduced.

## Validation

A synthetic executor-skill benchmark with artificial network delay:
thread-cold model-request p50 improved from about 1.79 s to 1.58 s.
## Why

`LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the
behavior was only covered indirectly. We should verify the actual JSONL
written by both user-facing entry points: `codex app-server` and the
standalone `codex-app-server` binary.

The existing processor shutdown message also always said the channel
closed, even though the processor can exit for several different
reasons. Structured fields make that event more accurate and useful to
log consumers.

## What changed

- Record the processor `exit_reason`, remaining connection count, and
forced-shutdown state as structured tracing fields.
- Add a shared process-test helper that enables JSON logging, validates
every stderr line as JSON, and verifies the top-level timestamp is RFC
3339.
- Cover both `codex app-server` and `codex-app-server`, asserting the
stable `level`, `fields`, and `target` payload.

## Test plan

- `just test -p codex-app-server
standalone_app_server_emits_json_info_events`
- `just test -p codex-cli app_server_emits_json_info_events`
## Summary

- Preserve the optional namespace on custom tool calls during response
deserialization and app-server replay.
- Use the namespaced tool identifier for streaming argument handling and
tool dispatch.
- Regenerate app-server protocol schemas.
- Add regression tests covering namespace serialization and routing.

## Testing

- Ran affected protocol and app-server test suites.
- Ran the full core test suite; two load-sensitive timing tests passed
when rerun individually.
- Ran Clippy and formatting checks.
- Verified with a local end-to-end app-server replay that the namespace
is preserved through the complete request/response flow.
## Why

Response item IDs represent stable conversation identity.
`ContextManager::for_prompt` repairs an unmatched call by synthesizing
an `"aborted"` output in the disposable prompt projection, but that
output previously had no ID. Assigning a fresh ID on every prompt build
would make retries and resumes change otherwise identical model context
and reduce prompt-cache reuse.

The concrete bug is that these normalization-created outputs bypass the
regular item-ID allocation path. Even with item IDs enabled, a prompt
could therefore contain an identified call paired with a synthetic
output whose `id` was missing. This change closes that gap by deriving
the output ID from the source call's item ID. For legacy calls that have
no item ID, the output remains ID-less because there is no stable source
identity to derive from.

The originating call already has a stable item ID under the item-ID
model introduced in #28814. A prompt-only output can therefore derive
stable identity from that call without mutating canonical history or
persisted rollouts. This addresses the failure exposed by #30311 while
keeping normalization read-only outside its detached prompt snapshot.

UUIDv5 is intentional here because it is the standard namespaced,
deterministic UUID construction. Using the output kind and source call
ID as the name produces the same UUID on every projection while keeping
output kinds in separate name domains. UUIDv7 would introduce randomness
and time, so keeping it stable would require persisting the synthetic
repair. UUIDv5 uses SHA-1 internally, but this is only an identity
mapping—not an authenticity or security boundary.

## What changed

- Derive a deterministic UUIDv5 ID for each synthesized call output from
the source call item ID.
- Use the Responses API prefix appropriate for function, custom-tool,
tool-search, and local-shell outputs.
- Preserve the existing insertion position immediately after the
unmatched call.
- Keep synthesized outputs prompt-only; no rollout, task-lifecycle,
compaction, or raw-response behavior changes.

## Testing

- `just test -p codex-core
for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history`
- `just test -p codex-core
synthetic_call_output_id_is_stable_across_resumes`
- `just test -p codex-core normalize_adds_missing_output`
- `just test -p codex-core response_item_ids`
## Why

App-server clients that configure named execution environments need to
discover an environment's shell and working directory before selecting
it for a thread or turn. Because the environment can run on a different
operating system than app-server, its working directory is represented
as a canonical `file:` URI rather than a host-local path string. The
probe also needs a bounded response time: an exec-server that completes
initialization but never answers `environment/info` must not hold the
environment serialization queue indefinitely.

## What changed

- Add an experimental `environment/info` app-server RPC for named
environments.
- Route the probe through the managed environment connection and return
target-native shell metadata plus the default working directory as a
`PathUri`.
- Return connection and protocol failures as JSON-RPC errors.
- Bound the exec-server probe response to 30 seconds and remove
timed-out calls from the pending-request table so later environment
mutations can proceed.
- Cover successful responses, omitted working directories, unknown
environments, connection failures, and pending-call cleanup.

## Protocol examples

Request:

```json
{
  "id": 42,
  "method": "environment/info",
  "params": {
    "environmentId": "remote-a"
  }
}
```

Successful response:

```json
{
  "id": 42,
  "result": {
    "shell": {
      "name": "zsh",
      "path": "/bin/zsh"
    },
    "cwd": "file:///workspace"
  }
}
```

If the exec-server initializes but does not answer the probe within 30
seconds:

```json
{
  "id": 42,
  "error": {
    "code": -32603,
    "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
  }
}
```

## Testing

- App-server integration coverage for successful info (including omitted
`cwd`), unknown environments, and connection failures.
- Exec-server RPC coverage verifying a timed-out call is removed from
the pending-request table.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
## Summary

- project effective marketplace/plugin config through the enterprise
source policy so blocked installed plugins become inactive
- filter plugin list/read/discovery and CLI marketplace source/snapshot
reporting using the same policy
- enforce source admission for background marketplace cache refreshes
- continue refreshing/upgrading independent marketplaces and plugins
when one entry fails, returning per-entry errors
- include policy-projected plugin state in cache and refresh keys so
requirement changes invalidate stale results

## Stack

This is PR 2 of 2 and is based on #29690. Review the admission model and
source matcher in #29690 first; this PR contains only runtime
enforcement.

## Test plan

- `just test -p codex-core-plugins` (287 tests)
- `just test -p codex-cli
plugin_list_ignores_implicit_system_marketplace_roots_without_manifests`
- `cargo check -p codex-cli -p codex-app-server --tests`
## Summary

Increase the external currentTime/read request timeout from 5 seconds to
10 seconds.

## Validation

- just fmt
- Focused app-server test build was stopped to defer validation to CI.
## Summary

- enable the remote plugin feature by default
- promote the remote plugin feature from under development to stable
- preserve the existing `features.remote_plugin` override for explicitly
disabling it
- keep legacy disabled-path coverage explicit in TUI and app-server
tests

## Impact

Remote plugin functionality is enabled by default for configurations
that do not set the feature flag. The existing Codex backend
authentication gate still applies.

## Validation

- `just fmt`
- `just test -p codex-features`
- `just test -p codex-tui
plugins_popup_remote_section_fallback_states_snapshot`
- targeted `codex-app-server` plugin-list and skills-list tests
- `git diff --check`

The full TUI and app-server suites were also exercised locally. All
remote-plugin-related coverage passed; unrelated local
sandbox/test-binary failures remain outside this change.
## Why

The safety-buffering prompt is a modal TUI view, but the normal
successful-turn path only hid the running status indicator. If the turn
completed while the prompt was open, the stale modal remained over the
composer until the user dismissed it or another turn started.

This aligns the TUI with the app behavior: keep the safety notice
visible while the turn is active, then remove it when the turn becomes
terminal. It also prevents the stale retry action from changing the
model and reasoning effort for a future turn after the buffered turn has
already completed.

| New copy |
|---|
| <img width="1014" height="313" alt="CleanShot 2026-06-28 at 20 27 18"
src="https://github.com/user-attachments/assets/f0f37359-5d77-442f-add2-9d1874bdc422"
/> |

## What changed

- Clear the active safety-buffering view and retry state when a turn
completes successfully.
- Update the retry-capable message to say “Hang tight or retry with a
faster model”.
- Extend the safety-buffering regression coverage to verify that the
prompt remains visible after assistant output starts and disappears when
the turn completes.
- Update the TUI snapshot for the revised copy.

This is a follow-up to #29919.

## How to Test

1. Start a TUI turn that receives `model/safetyBuffering/updated` with
`showBufferingUi: true` and a `fasterModel`.
2. Confirm the prompt says “Hang tight or retry with a faster model”.
3. Let the turn continue and confirm the prompt remains visible while
the turn is active.
4. Let the turn finish successfully and confirm the prompt disappears
and the composer is restored without requiring an extra keypress.
5. Confirm a buffering update without a faster model still shows the
shorter non-retry message.

Targeted automated coverage:

- `just test -p codex-tui safety_buffering` — 4 passed.
- `just test -p codex-tui` — 2,951 passed; two unrelated Guardian
feature-flag tests failed identically on `main` in this environment.

The argument-comment lint was also audited manually. The workspace Bazel
invocation was blocked by a missing external LLVM `compiler-rt` BUILD
file, and the packaged per-crate fallback uses a nightly older than the
current `sqlx` minimum Rust version.
## Summary

- add a false-by-default `include_skills_usage_instructions` model
metadata field
- enable the field for the bundled `gpt-5.5` model metadata
- consume the metadata in both core and extension skill rendering
- remove hardcoded legacy-model matching and its marker plumbing
## Summary

- restore the v1 clarification that requests for depth, research, or
investigation do not authorize subagent spawning
- restore guidance for keeping critical-path, urgent, tightly coupled,
or difficult work local
- update the focused v1 tool-search and spawn-description coverage

## Why

PR #27919 simplified the v1 `spawn_agent` prompt by removing its
delegation decision guidance. That left the authorization rule intact,
but removed the instructions that constrained what should be delegated
after spawning was authorized.

Restore those guardrails while preserving later support for explicit
delegation authorization from applicable AGENTS.md and skill
instructions. Multi-agent v2 prompts are unchanged.

## User impact

Models using the v1 multi-agent tool surface receive clearer guidance to
delegate independent side work while keeping blocking work on the main
rollout.

## Validation

- `just fmt`
- `git diff --check`
- tests not run locally per repository guidance; CI will validate the
focused coverage
## Why

The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an
opaque custom effort. That made the reasoning picker render it as
lowercase `max` while known efforts use productized labels.

Making `max` a known effort aligns catalog data, parsing, and UI
presentation without changing the `max` wire value or persisted
representation.

## What changed

- Add first-class `ReasoningEffort::Max` parsing and serialization.
- Use the typed effort in the Bedrock catalog and render it as `Max` in
the TUI.
- Preserve forward-compatible custom-effort coverage with a genuinely
unknown `future` value.

### Before
<img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM"
src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364"
/>

### After
<img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM"
src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17"
/>
## Summary

Bio/Cyber safety surfaces in the TUI could send users to stale Trusted
Access pages, and safety buffering did not always expose the Help
Center.

This follow-up to #30317 adds the missing Learn more action, refreshes
the Bio access URL and block copy, and updates the affected snapshots
while preserving the existing retry and wait behavior.
## Summary

AWS Bedrock issues currently fall under broader labels, which makes
provider-specific reports harder to find. The issue tracker now has an
`aws-bedrock` label, but the automated labeler does not know to apply
it.

Teach the issue labeler to select `aws-bedrock` for Amazon Bedrock
provider or Bedrock Mantle issues while excluding generic AWS
references.
## Summary

Disable Nagle unconditionally for both exec-server Rendezvous WebSocket
connections.

- pass `disable_nagle=true` at the executor and harness connection call
sites
- keep the existing signed URL, protocol, and connection flow unchanged
- add no feature flag, rollout schema, path variant, or
experiment-specific telemetry

The companion internal PR enables `TCP_NODELAY` on accepted Rendezvous
sockets: openai/openai#1082463

## Why

Rendezvous carries small, latency-sensitive relay and JSON-RPC frames.
Three staging runs of 30 steady-state `process/read` calls per
configuration measured p50 improving from 139.1 ms to 81.5 ms and p95
from 162.0 ms to 95.8 ms with Nagle disabled.

The expected packet overhead is small at the current connection scale.
We will use existing latency, error, packet, and CPU monitoring and
revert normally if production regresses.

## Rollout and rollback

The client and accepted-socket changes can deploy independently. New
connections receive the setting as each side deploys. Rollback is a
normal code revert; there is no persisted assignment or gate state to
unwind.

## Validation

- `just test -p codex-exec-server --lib`: 164 passed
- `just fix -p codex-exec-server`: passed
- `just fmt`: passed
- independent final review found no actionable issue
## Summary

The TUI biosafety block still included obsolete copy telling approved
researchers they may be able to apply for Trusted Access.

Remove that sentence and update the UI snapshot to match the approved
wording.
## Summary
This is a follow-up to #29432 to
remove one additional trace statement that is not being filtered by
#29457.


## Testing
- [x] Unit tests pass
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

⤵️ pull merge-conflict Sync PR has merge conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.