[pull] main from openai:main#58
Open
pull[bot] wants to merge 3521 commits into
Open
Conversation
## Description Restore `thread_source` in `x-codex-turn-metadata`. Inadvertently removed `thread_source` from `x-codex-turn-metadata` in #27122 - didn't realize it was a top-level thread app-server API field, not passed in `responsesapi_client_metadata`. This also reserves the key so `responsesapi_client_metadata` cannot override it.
## Why The local SQLite log sink currently enables TRACE for every target. This persists high-volume dependency logs bridged through `target=log` and duplicates OpenTelemetry mirror events in `codex_otel.log_only` and `codex_otel.trace_safe`. These records rapidly consume the per-partition log budget and cause unnecessary SQLite insert-and-prune churn. ## What changed - Keep TRACE persistence for other targets. - Exclude bridged `target=log` events from the SQLite sink. - Exclude the two `codex_otel` mirror targets from the SQLite sink. - Share the same filter between app-server and TUI. Remote OpenTelemetry export and metrics are unchanged.
## What - make Fjord's centralized response-item image preparation unconditional for new and resumed history - have local user images and `view_image` outputs always defer decoding and resizing to that path - retain `resize_all_images` as an ignored, removed compatibility key for released clients - delete the flag-off producer paths and obsolete policy-specific tests ## Why Centralized preparation is now the intended image path. Keeping the runtime feature checks also kept two image-processing implementations alive and allowed client config to select the legacy behavior. This is a clean replacement for #28975, rebuilt from the latest `main`. ## How `prepare_response_items` now runs whenever items enter history and whenever persisted history is reconstructed. Producers emit deferred image data, so malformed images become the existing model-visible placeholder instead of failing the session at the producer. ## Test plan - `just fmt` - `just fix -p codex-core -p codex-features` - `just test -p codex-features` — 52 passed - focused affected `codex-core` set — 20 passed - `just test -p codex-core handle_accepts_explicit_high_detail` — 1 passed - full `just test -p codex-core` attempt — 2,723 passed; 88 unrelated environment failures from read-only `~/.codex` SQLite state and unavailable integration helper binaries
The custom Windows argument-comment-lint job was temporarily moved to `windows-2022` in #28940 after hermetic LLVM source extraction failed on the newer runner. This takes the upstream extraction fix so the job can return to the intended custom runner. This upgrades `llvm` to `0.7.9` and `rules_cc` to `0.2.18`, refreshes the module lock, rebases the remaining Windows and custom libc++ patches, drops the obsolete symlink-extraction workaround, and restores the `windows-x64` runner configuration. Validation: - Verified all LLVM patches apply cleanly against the `0.7.9` source. - Built `@llvm-project//compiler-rt:clang_rt.builtins.static`.
This PR moves construction of `PluginTelemetryMetadata` from loader and
model helpers into `PluginsManager`, which already owns installed plugin
state and will eventually perform remote identity enrichment. The
metadata type remains in `codex-plugin`, and serialized analytics events
remain unchanged.
## Before
```mermaid
flowchart LR
subgraph Events["Analytics event paths"]
direction TB
Lifecycle["Local install / uninstall"]
Config["Enable / disable"]
Remote["Remote install"]
Used["Plugin used"]
end
subgraph Construction["Metadata construction"]
direction TB
Loader["Loader telemetry helpers"]
Summary["PluginCapabilitySummary::telemetry_metadata"]
Override["Caller adds remote_plugin_id"]
end
Metadata["PluginTelemetryMetadata"]
Lifecycle --> Loader
Config --> Loader
Remote --> Loader
Loader -->|"local events"| Metadata
Loader -->|"remote install"| Override
Override --> Metadata
Used --> Summary
Summary --> Metadata
```
Telemetry metadata was constructed through loader helpers, a
capability-summary method, and a remote-install call-site override.
## After
```mermaid
flowchart LR
subgraph Events["Analytics event paths"]
direction TB
Lifecycle["Local install / uninstall"]
Config["Enable / disable"]
Remote["Remote install"]
Used["Plugin used"]
end
Manager["PluginsManager — single construction owner"]
Metadata["PluginTelemetryMetadata"]
Lifecycle --> Manager
Config --> Manager
Remote -->|"authoritative remote ID"| Manager
Used -->|"capability summary"| Manager
Manager --> Metadata
```
Every analytics path delegates metadata construction to
`PluginsManager`. Remote install still supplies its authoritative
backend ID explicitly.
## What Changes
- Make loader code return a focused plugin capability summary instead of
constructing analytics metadata.
- Centralize immutable plugin telemetry metadata construction in
`PluginsManager`.
- Route local install/uninstall, remote install, enable/disable, and
plugin-used emitters through the manager.
- Preserve the current serialized analytics contract exactly.
Normal metadata still has no remote override. Remote install continues
to provide its authoritative backend ID explicitly, so the existing
serializer continues reporting that ID through `plugin_id`.
Snapshot-based enrichment is intentionally deferred to the final PR.
## Testing
- `just test -p codex-core-plugins` (238 tests passed)
- `just test -p codex-plugin` (3 tests passed)
- Scoped Clippy/compile checks passed for `codex-plugin`,
`codex-core-plugins`, `codex-app-server`, and `codex-core`.
## Split Overview
```text
main
├── #27093 Debug analytics capture (merged)
├── #27099 Non-mutating plugin smoke (merged)
├── #27100 Remote install/uninstall smoke (merged)
└── #27102 Plugin telemetry metadata refactor ← you are here
└── #27669 Persist remote plugin identity
After #27102 and #27669 merge:
└── Final PR: add explicit local and remote IDs to plugin analytics
```
Review order and dependencies:
1. [#27093 Add debug-only analytics event
capture](#27093) (merged)
2. [#27099 Add a plugin analytics smoke
workflow](#27099) (merged)
3. [#27100 Add a remote plugin analytics mutation smoke
workflow](#27100) (merged)
4. This metadata refactor, independent and based on `main`
5. [#27669 Persist remote plugin
identity](#27669), stacked on this
PR
6. Final remote-ID behavior PR, created after the prerequisites merge
The original [#26281](#26281)
remains open as the aggregate reference until the final replacement PR
is published.
## Summary [#26701](#26701) added remote plugin identity support, [#26702](#26702) added remote-section fetching and state, and [#28768](#28768) extracted the catalog rendering module. This PR builds the product-facing `/plugins` catalog on that foundation so remote records appear as OpenAI Curated, Workspace, and Shared with me sections rather than backend marketplace implementation details. Plugin details remain read-only for sharing metadata. This PR does not add share-authoring actions or change the app-server protocol. ## Changes - Renders OpenAI Curated, Workspace, and Shared with me sections with loading, empty, and error states. - Preserves section selection and stable tab ordering as remote sections transition between fallback and populated states. - Shows OpenAI Curated loading only when the explicit vertical fallback request was issued. - Centralizes remote marketplace identity matching around the existing marketplace constants. - Uses product labels for remote marketplaces and identifies the personal marketplace as Local by its path. - Shows read-only source, authentication, version, and sharing metadata in plugin detail views. - Applies narrow display deduplication for local and remote records sharing a remote plugin ID: - installed records take precedence; - local mapped sources are preferred for details only when their installed state matches the selected record. - Returns from detail and confirmation views through the current plugin cache so newly loaded remote sections are not overwritten by an older captured response. - Keeps admin-disabled plugins view-only and labels default-installed plugins as Available by default. ## Tests New tests: - `plugins_popup_admin_disabled_available_plugin_has_view_only_hint` - `plugins_popup_remote_section_fallback_states_snapshot` - `plugins_popup_installed_remote_row_keeps_remote_detail_when_local_share_is_uninstalled` Updated existing plugin catalog tests and snapshots for product labels, detail metadata, personal-marketplace labeling, and stable tab ordering. Verification: - `cargo clippy -p codex-tui --all-targets -- -D warnings` ## Follow-ups - Local/remote duplicate normalization should eventually move into app-server. This PR intentionally keeps the compatibility behavior narrow and display-only. - PR5 will sanitize sensitive components before displaying Git source URLs.
## Why #29113 moved remote sandbox setup and enforcement to the exec server. That gives the executor ownership of the platform-specific work: a Linux executor chooses and runs a Linux sandbox even when the Codex orchestrator is running on macOS or Windows. It also means the orchestrator no longer knows which concrete sandbox the executor selected. When that sandbox blocks a remote command, the orchestrator currently sees only a failed process and can treat the denial as an ordinary command failure. The existing sandbox approval and retry path is then skipped. This PR lets the executor report one portable fact: > This command probably failed because the executor sandbox blocked it. The executor keeps its concrete sandbox type private. The protocol sends only the semantic result. ## Example Suppose a local macOS Codex session asks a Linux devbox to write outside the allowed workspace. Before this PR: ```text Linux sandbox blocks the write -> remote process exits with "Permission denied" -> local orchestrator sees an ordinary command failure -> the normal sandbox approval and retry path can be skipped ``` With this PR: ```text Linux sandbox blocks the write -> executor reports sandboxDenied: true -> unified exec returns UnifiedExecError::SandboxDenied -> the existing approval prompt is shown -> an approved retry runs through the existing unsandboxed retry path ``` ## What changes ### The executor remembers its selected sandbox The prepared remote process now retains the executor-selected `SandboxType`. This value never crosses the executor boundary. Commands started without a sandbox retain `SandboxType::None` and are never reported as sandbox denials. ### The executor uses the existing denial heuristic The existing local denial heuristic moves from `codex-core` into the shared `codex-sandboxing` crate. When a sandboxed remote process exits, the executor: 1. waits the same short output grace period used by local unified exec; 2. reads the output currently available in the existing retained output buffer; 3. runs the existing heuristic using the exit code and common denial messages; 4. stores the yes/no result before publishing the process exit. This deliberately matches the old local unified-exec behavior. It does not add a new streaming classifier, another output buffer, or stronger output-retention guarantees. ### The protocol reports a portable boolean `process/read` gains `sandboxDenied`: ```json { "exited": true, "exitCode": 1, "closed": false, "sandboxDenied": true } ``` The field defaults to `false` when an older executor omits it. The response does not expose the executor sandbox implementation or executor-native paths. ### Unified exec uses the existing error path The exec-server client carries `sandboxDenied` into the unified process state. If it is true, unified exec returns the existing `SandboxDenied` error instead of trying to classify remote output using an orchestrator-side sandbox type. Remote process exit remains visible as soon as the process exits. This PR does not wait for stdout or stderr to close and does not change the existing process lifecycle. ## Scope This PR is intentionally limited to matching the existing local unified-exec behavior for the initial command execution path. It does not add: - incremental denial tracking across the full output stream; - new denial handling for commands completed later through `write_stdin`; - new guarantees for preserving the semantic flag during the narrow reconnect-recovery race. Those can be considered separately if the same behavior is added for local execution. ## Test coverage One remote end-to-end integration test covers the complete intended flow: ```text remote read-only sandbox -> denied write -> executor reports the denial -> Codex requests approval -> user approves -> retry succeeds on the remote executor ``` Existing lifecycle coverage continues to verify that remote process exit is reported before late output streams close.
…28968) ## Description This PR cuts Codex over from generic `ResponseItem.metadata` (introduced here: #28355) to `ResponseItem.internal_chat_message_metadata_passthrough`, which is the blessed path and has strongly-typed keys. For now we have to drop this MAv2 usage of `metadata`: #28561 until we figure out where that should live.
## Summary - use generated image data URLs in the Python SDK examples and notebook - document HTTP and HTTPS image URLs as deprecated and recommend `LocalImageInput` - replace the remote-URL integration test with data-URL coverage `ImageInput` remains available for data URLs. The SDK does not duplicate app-server URL validation. ## Testing - `uv run --frozen --no-sync ruff check --output-format=full .` - `uv run --frozen --no-sync ruff format --check .` - full Python SDK test suite with an isolated writable `CODEX_SQLITE_HOME` (119 passed, 38 skipped)
## Why The reset flow introduced in #28154 still describes earned reset credits as "rate-limit resets" and uses generic reset-scope copy. It can also retain a stale available-credit count after redemption or an account change, leaving the reset action enabled after the last credit is used. This follow-up updates terminology only within that reset feature. Existing rate-limit wording elsewhere in the CLI and TUI is unchanged. ## What changed - Rename reset-specific `/usage` menu items, startup hints, and reset dialogs to "usage limit reset." - Describe monthly resets for Free, Go, and accounts that report a monthly usage window; otherwise describe the current 5-hour and weekly limits. - Recheck a cached zero balance when `/usage` is reopened, and refresh the balance after redemption so the final reset immediately disables the action. - Correlate async refresh results before updating snapshots and clear account-derived reset state, warnings, prompts, and status surfaces when the account changes. ## Validation - `just test -p codex-tui chatwidget::tests::usage` — 29 passed. - `just test -p codex-tui chatwidget::tests::status_command_tests` — 7 passed. - Account-boundary prompt and plan-mode prompt regression tests passed. - `cargo insta pending-snapshots` from `codex-rs/tui` — no pending snapshots.\ <img width="814" height="318" alt="image" src="https://github.com/user-attachments/assets/2a460e96-458b-4805-8d9f-c759382d21a4" /> view for monthly <img width="905" height="243" alt="image" src="https://github.com/user-attachments/assets/179f88e3-08fb-4af5-8dc6-ce6a944ed681" />
…ed (#27982) ## Why The first auto-review currently creates its Guardian child session on demand, adding avoidable latency before the review can begin. Creating the ordinary Guardian child during parent-session initialization lets that child use the existing session startup WebSocket prewarm before the first escalation. This does not introduce a Guardian-specific prewarm mechanism. ## What changed - initialize the existing Guardian review-session manager owned by `Session` when a thread starts with auto-review enabled and an approval policy that routes to Guardian - use the standard Guardian child-session construction and the existing session startup WebSocket prewarm - preserve the existing reuse-key invalidation and lazy creation fallback when startup initialization fails or the effective review configuration changes - add an integration test that verifies normal root-session startup emits a Guardian `generate=false` prewarm request ## Benchmark I compared release builds against main. Each prompt first ran a non-escalated `sleep 3`, then requested an escalated marker command. | binary | count | avg Guardian duration | median Guardian duration | avg Guardian TTFT | |---|---:|---:|---:|---:| | origin-main | 10 | 4008.7 ms | 3949.5 ms | 3746.5 ms | | session-fix | 10 | 2865.0 ms | 2594.0 ms | 2492.7 ms | Guardian duration fell by 28.5% and Guardian TTFT fell by 33.5%. These measurements cover Guardian review latency; they do not measure parent thread-start latency.
## Why `compile_scoped_filesystem_pattern()` accepted a `_policy_cwd` parameter even though scoped glob compilation no longer uses the policy working directory. Keeping that unused argument forced the surrounding permissions compilation path to keep forwarding `policy_cwd` through call sites that did not need it, making the API look more dependent on cwd resolution than it is. ## What changed Removed the unused cwd parameter from `compile_scoped_filesystem_pattern()` and the callers that only forwarded it: `compile_filesystem_permission()`, `compile_permission_profile()`, and `compile_permission_profile_selection()`. Workspace root resolution still keeps `policy_cwd`, because that path still resolves relative roots against the active policy cwd. Relevant code: [`codex-rs/core/src/config/permissions.rs`](https://github.com/openai/codex/blob/b8b9816102e064dae4488ec130cf560f63c1ab78/codex-rs/core/src/config/permissions.rs#L346). ## Verification - `just test -p codex-core config::permissions` - `just test -p codex-core` was also run after building `test_stdio_server`; it passed the touched permissions coverage but still reported unrelated existing failures in `cli_stream` and shell snapshot tests.
## Summary Stacked on #26706. Adds the shared auth/system-proxy contract that later platform resolver PRs plug into. This PR moves Codex-owned auth and startup HTTP clients through a common route-aware boundary, but does not yet add Windows or macOS system proxy resolution. The default path remains unchanged when `respect_system_proxy` is absent or disabled. ## Implementation - Adds `codex-client/src/outbound_proxy.rs` with the shared route-selection model: - `OutboundProxyConfig`; - `ClientRouteClass`; - `RouteFailureClass`; - `build_reqwest_client_for_route`. - Preserves the existing reqwest/default-client behavior when no route config is supplied. - Uses the fixed MVP routing policy when route config is supplied: platform system/PAC/WPAD discovery, then explicit env proxy variables, then direct connection. - Keeps platform-specific system discovery behind the shared client boundary. This PR provides the contract and fallback behavior; later resolver PRs plug in Windows and macOS discovery. - Adds `login::AuthRouteConfig` so auth call sites depend on a small policy type instead of platform resolver details. - Maps the resolved `Config.respect_system_proxy` boolean into `AuthRouteConfig` for auth-owned clients. - Wires the route config through browser login, device-code login, access-token login, login status, logout/revoke, token refresh, API-key exchange, app-server account login, TUI/app startup, cloud-config bootstrap, cloud tasks, plugin auth, and exec startup config loading. ## End-user behavior - No behavior changes by default. - When `respect_system_proxy = true`, auth-owned clients opt into the shared route-aware client path. - On platforms without a resolver implementation in this PR, system discovery is unavailable and the route-aware path falls back to explicit env proxy handling, then direct connection. - Custom CA handling remains separate from proxy route selection and still runs through the shared client builder. - No proxy URLs, PAC contents, or resolved platform details are exposed through the public config surface introduced here. ## Tests Adds or updates coverage for: - preserving default auth-client fallback behavior when no route config is provided; - injected environment-proxy fallback without mutating process environment; - existing login-server E2E flows using explicit `auth_route_config: None` to guard unchanged default behavior; - updated auth manager, login, logout, cloud-config, startup, and plugin-auth call sites passing route config explicitly.
# Summary Codex required every ChatGPT account to have an email address. A service-account personal access token can return valid account metadata without one, so PAT login failed while decoding the metadata response. This change makes email optional in the account metadata type that owns it and preserves that absence through authentication, provider account state, the app-server API, generated clients, and TUI bootstrap. Existing accounts with email addresses keep the same behavior. ## Behavior-changing call sites | Call site | Behavior after this change | | --- | --- | | `login/src/auth/personal_access_token.rs` | PAT metadata accepts a missing or null email and retains `None`. | | `agent-identity/src/lib.rs` | Agent Identity JWT claims accept an omitted email. | | `login/src/auth/storage.rs` and `login/src/auth/agent_identity.rs` | Stored and managed Agent Identity records carry `Option<String>`. Deserialization maps the legacy empty-string sentinel to `None`. | | `login/src/auth/manager.rs` | `get_account_email` returns the stored option, and managed identity bootstrap no longer converts `None` to an empty string. | | `model-provider/src/provider.rs` and `protocol/src/account.rs` | A ChatGPT provider account requires a plan type but may carry no email. | | `app-server-protocol/src/protocol/v2/account.rs` | `account/read` keeps the `email` field on the wire and returns `null` when the account has no email. Generated TypeScript and JSON schemas describe a required, nullable field. | | `sdk/python/src/openai_codex/generated/v2_all.py` | The generated Python `ChatgptAccount` model accepts `None` for email. | | `tui/src/app_server_session.rs` | Email-less ChatGPT accounts bootstrap normally, keep external feedback routing, omit account-email telemetry, and display the plan in account status. | ## Design decisions - Missing email remains `None` at every layer. The code never uses an empty string as a substitute. - The app-server response includes `"email": null` instead of omitting the field. Clients retain a stable response shape. - Plan type remains required for provider account state. This change relaxes only the email assumption. ## Testing Tests: affected test targets compile, scoped Clippy and formatting pass, a focused TUI snapshot covers plan-only account status, real before/after PAT login smoke covers metadata without email, app-server smoke covers `account/read` with `email: null`, and a regression smoke covers an existing email-bearing PAT. Unit tests run in CI. ## Evidence Visual smoke evidence will be attached here.
## Summary
Instead of:
reminder_interval_tokens = 65_536
allow users to configure explicit remaining-token reminder thresholds:
reminder_at_remaining_tokens = [65_536, 32_768, 16_384, 8_192, 4_096,
2_048, 1_024, 512]
## Validation
- CARGO_INCREMENTAL=0 just test -p codex-core rollout_budget: 9 passed
- just fix -p codex-core
- just fmt
## Why `permissionProfile/list` currently advertises every built-in and configured profile even when effective enterprise requirements prevent selecting it. That forces each client to reconstruct policy from lower-level requirement fields, which is easy to miss and difficult to keep consistent. The catalog should remain complete so clients can explain that an option was disabled by an administrator, while also reporting whether each profile is selectable. ## What - Add an `allowed` field to each permission profile summary. - Build a shared catalog from the effective config and current requirements, including `allowed_sandbox_modes`, `allowed_permissions`, and filesystem restrictions. - Use the shared catalog in app-server and the TUI so disallowed profiles remain visible but cannot be selected. - Use the canonical `:danger-full-access` profile ID in the TUI. - Update the app-server schemas, API documentation, behavioral tests, and TUI snapshots. ## Scope This PR targets `main` directly and is independent of #24852. It preserves the current behavior where built-in profiles are constrained by sandbox-mode requirements and `allowed_permissions` applies to configured profiles. ## Testing - `just test -p codex-core permission_profile_catalog_marks_profiles_disallowed_by_requirements` - `just test -p codex-app-server permission_profile_list` - `just test -p codex-app-server-protocol` - `just test -p codex-tui profile_permissions` - `just fix -p codex-core` - `just fix -p codex-app-server-protocol` - `just fix -p codex-app-server` - `just fix -p codex-tui` - `just fmt` --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Joey Trasatti <joey.trasatti@openai.com>
## Why `codex-app-server-test-client` previously treated `item/tool/requestUserInput` as an unsupported server request and terminated the connection. That made it impossible to use the client for end-to-end testing of interactive turns: an operator could observe the request, but could not answer it and confirm that the same turn resumed. ## What changed - Handle `ToolRequestUserInput` server requests in the test client's central request dispatcher. - Render numbered terminal choices, accept exact option labels, support free-form `Other` and text-only questions, and collect multiple answers. - Send a protocol-native `ToolRequestUserInputResponse` and continue streaming the active turn. - Fail clearly when interactive input is requested without a terminal. - Document the interactive behavior and add focused tests for option selection, free-form answers, multiple questions, and invalid-selection retries. ## Testing - `just test -p codex-app-server-test-client` - `just bazel-lock-check` - Manually exercised the app-server flow, selected `TUI`, observed `serverRequest/resolved`, and verified that the same turn completed with the selected answer.
## Summary - rename `Config::permission_profile_allowed` to `is_permission_profile_allowed` - use `BUILT_IN_PERMISSION_PROFILE_DANGER_FULL_ACCESS` in the TUI and its assertion - follow up on the late review comments from #26678 The previous `:danger-no-sandbox` value was an invalid built-in profile ID. #26678 corrected it to `:danger-full-access`; this PR centralizes the value to prevent future drift. ## Testing - Not run per request; `cargo fmt` only Co-authored-by: Codex <noreply@openai.com>
## Why When Codex starts with a custom CA override such as `SSL_CERT_FILE=/path/to/corp-ca.pem codex`, `rustls-native-certs` treats that override as a replacement for the platform trust store. The managed proxy then rewrites child CA variables to its generated bundle, so the custom root or the ordinary platform roots can be lost. The proxy's upstream TLS connector must trust the same roots or private and corporate upstream certificates still fail after interception. ## What - load platform-native roots without consulting inherited CA override variables - append certificates from the existing curated startup CA file variables and `SSL_CERT_DIR` - share those platform and startup roots with the MITM upstream rustls connector - exclude the Codex managed MITM CA from upstream trust - normalize OpenSSL `TRUSTED CERTIFICATE` blocks while dropping trailing trust metadata - skip an inherited current Codex-managed bundle so nested launches do not duplicate it - append the Codex managed MITM CA to the child-facing bundle - copy certificate material only, so a private key or unrelated text colocated in a startup file is never exposed through the public bundle This is intentionally limited to CA paths present when Codex starts. It does not parse inline shell assignments or add per-command bundle materialization. This changes only `codex-network-proxy` and dependency metadata; it does not touch `codex-core` or sandbox orchestration. ## Validation - `just test -p codex-network-proxy` - includes an end-to-end upstream TLS test using a server trusted only by the startup custom CA - `just fix -p codex-network-proxy` - `just bazel-lock-check`
## Why `openai-oss-forks/tokio-tungstenite` now includes the updated `tungstenite` fork revision from [openai-oss-forks/tokio-tungstenite#3](openai-oss-forks/tokio-tungstenite#3). Codex should consume the merged fork commit and resolve its direct and transitive `tungstenite` dependencies to the same revision instead of retaining the older pins. ## What Changed - Advanced the `tokio-tungstenite` git pin to `0e5b2d73aa18dd9f0a50ee9ff199d5aef7594186`. - Advanced the `tungstenite` fork pin to `4fffad30fe373adbdcffab9545e9e9bf4f2fc19f` and adjusted the patch source so the transitive dependency resolves to that revision. - Updated `Cargo.lock` and `MODULE.bazel.lock` to match the dependency graph.
## This PR
Remote plugin analytics cannot rely only on the in-memory
installed-plugin snapshot because that snapshot is refreshed
asynchronously after startup. This PR persists the authoritative backend
identity alongside each cached remote plugin bundle so later consumers
can resolve it without a network request.
### Behavior
- Store Codex-owned remote installation metadata in an atomic
`.codex-remote-plugin-install.json` sidecar under the plugin cache root.
- Use a versioned, snake_case schema:
```json
{
"schema_version": 1,
"remote_plugin_id": "plugins~Plugin_..."
}
```
- Write the metadata during remote bundle installation.
- Backfill it when bundle sync finds an already-current cached bundle.
- Clear it when a generic/local install replaces the cache.
- Let existing uninstall and stale-cache removal delete it with the
plugin cache root.
- Reject unsupported schema versions rather than silently misreading
future formats.
This PR does not change analytics serialization or event behavior.
### Review surface
The implementation is limited to four `codex-core-plugins` files:
- `store.rs`: owns the versioned sidecar read/write/remove lifecycle.
- `remote_bundle.rs`: persists the backend ID after a remote bundle
install.
- `remote/remote_installed_plugin_sync.rs`: backfills metadata for an
already-current cached bundle.
- Tests cover the storage lifecycle and both remote write paths.
## Testing / Validation
### Automated
- `just test -p codex-core-plugins` (268 tests passed)
- `just fix -p codex-core-plugins` passes with one pre-existing
`large_enum_variant` warning in `manifest.rs`.
- Coverage verifies the exact filename and JSON schema, identity
replacement, local reinstall clearing, uninstall cleanup, remote bundle
installation, unsupported schema rejection, and installed-plugin sync
backfill.
### Live manual validation
Validated the production app-server RPC path with an isolated temporary
`CODEX_HOME` and the PR-built Codex binary. The app-server communicated
over stdio and did not bind a port.
Test plugin: `plugins~Plugin_b80dd84519148191a409cde181c9b3d6`
(`build-macos-apps@openai-curated-remote`).
1. Confirmed `plugin/read` initially reported the plugin uninstalled.
2. Installed it through `plugin/install` and confirmed version `0.1.4`
was cached.
3. Verified
`$CODEX_HOME/plugins/cache/openai-curated-remote/build-macos-apps/.codex-remote-plugin-install.json`
was created beside the `0.1.4/` bundle directory with mode `0600` and
the expected contents:
```json
{
"schema_version": 1,
"remote_plugin_id": "plugins~Plugin_b80dd84519148191a409cde181c9b3d6"
}
```
4. Deleted only the sidecar, restarted the app-server, and confirmed
installed-plugin startup sync recreated it with the same contents.
5. Uninstalled through `plugin/uninstall`, confirmed `plugin/read`
returned `installed: false`, and verified the local plugin cache root
was removed.
6. Restored the account's original uninstalled state and removed the
isolated home and copied credentials.
## Split Overview
```text
main
├── #27093 Debug analytics capture merged
│ └── #27099 Non-mutating plugin smoke merged
│ └── #27100 Remote install/uninstall smoke merged
└── #27102 Plugin telemetry metadata refactor merged
└── #27669 Persist remote plugin identity ← this PR
Next:
└── Final PR: add explicit local and remote IDs to plugin analytics
```
This PR is based directly on `main`; prerequisite
[#27102](#27102) has merged. The
original combined [#26281](#26281)
remains the aggregate reference until the final replacement PR is
published.
## Summary Stacked on #26707. Adds the Windows implementation of the shared system-proxy contract. This allows Codex-owned auth clients to use the route Windows selects for each auth URL, including explicit PAC configuration, WPAD auto-detection, static proxies, and bypass rules. The `respect_system_proxy` feature is disabled by default, so existing client behavior remains unchanged unless explicitly enabled. ## Implementation - Adds Windows-only `codex-client` dependencies: - `windows-sys` with `Win32_Foundation` and `Win32_Networking_WinHttp`; - `sha2` for redacted cache keys. - Dispatches system-proxy resolution to `outbound_proxy/windows.rs` on Windows. - Reads the current-user WinHTTP/IE proxy configuration via `WinHttpGetIEProxyConfigForCurrentUser`. - Resolves explicit PAC URLs first, then OS-enabled WPAD auto-detection, then static proxy and bypass settings. - Uses `WinHttpGetProxyForUrl` for PAC/WPAD and maps results into the shared `SystemProxyDecision::{Direct, Proxy, Unavailable}` contract. - Parses `DIRECT`, `PROXY`, `HTTPS`, and keyed static proxy entries. - Treats unsupported schemes such as SOCKS as unavailable so the shared resolver can apply its environment-proxy fallback. - Handles Windows bypass entries, including `<local>` and host, suffix, wildcard, and port matching. - Releases WinHTTP-owned strings with `GlobalFree` and closes sessions with `WinHttpCloseHandle`. - Hashes URL-specific cache keys with SHA-256 so PAC decisions remain URL-specific without retaining raw request URLs or query strings. ## End-user behavior - Disabled/default: existing client behavior is unchanged. - Enabled with `[features.respect_system_proxy]`: - Windows auth clients honor explicit PAC configuration, OS-enabled WPAD, static proxies, and bypass rules; - valid OS/PAC `DIRECT` decisions use a direct connection; - unavailable system resolution falls back to explicit environment proxy variables, then `DIRECT`, through the shared contract from #26707. - Unsupported proxy schemes are not silently translated into a different route. - Custom CA handling remains separate from proxy selection. ## Tests Adds coverage for: - PAC-style proxy tokens such as `PROXY proxy.internal:8080` and `HTTPS proxy.internal:8443`; - static WinHTTP proxy entries keyed by target scheme; - `DIRECT` and unsupported proxy-token behavior; - Windows bypass matching, including `<local>`, wildcard, suffix, and port-qualified entries; - preserving URL-specific PAC cache decisions without retaining the raw URL on Windows.
register cdp requirements feature flag
## Summary - fetch featured plugin IDs when the loaded catalog includes `openai-curated-remote` - extend the existing remote marketplace regression test to cover the featured IDs response ## Why When the remote plugin catalog was enabled, app-server loaded `openai-curated-remote` but skipped `/plugins/featured` because the request processor only fetched featured IDs for the local `openai-curated` marketplace. As a result, the desktop app could not render the backend-curated remote featured set. This keeps the existing local behavior and also returns the curated ranking for remote plugins. ## Test plan - `just fmt` - `git diff --check` - `just test -p codex-app-server plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled`
## Summary - upgrade the bundled OpenSSL source from 3.5.5 to 3.6.3 - update the Bazel `openssl-sys` build dependency to use the upgraded source crate - refresh the Bazel module lockfile ## Why OpenSSL 3.5.5 is within the affected ranges for security issues fixed in later releases. The Rust `openssl-src` wrapper does not currently publish OpenSSL 3.5.7, so this moves the vendored Linux musl build to the available patched 3.6.3 release.
## Why The TypeScript workspace resolved `esbuild` 0.25.10 transitively through the SDK toolchain. `esbuild` 0.28.1 adds integrity verification to the Deno binary download path addressed by [GHSA-gv7w-rqvm-qjhr](GHSA-gv7w-rqvm-qjhr), preventing an attacker-controlled npm registry from supplying an executable without a content check. ## What changed - Add a root workspace resolution for `esbuild` 0.28.1. - Regenerate `pnpm-lock.yaml` so `tsup`, `bundle-require`, and `ts-jest` all resolve the patched version. ## Validation - Frozen pnpm install, including the SDK's `tsup` build - `pnpm --filter @openai/codex-sdk exec jest tests/exec.test.ts --runInBand` - Confirmed the installed dependency graph contains only `esbuild` 0.28.1
Adds additive dark-mode plugin logo metadata across manifests, remote catalogs, and the app-server protocol while keeping uninstalled Git listings free of synthetic local paths. Supersedes #28945. This replacement uses an upstream branch so trusted CI can use the repository-provided remote Bazel configuration. ## Current state Plugin interfaces expose only the default logo asset. Clients therefore cannot select a dedicated dark-mode logo even when a plugin provides one. ## What this PR changes - Adds nullable `logoDark` and `logoUrlDark` fields to `PluginInterface`. - Resolves local `interface.logoDark` assets and maps remote `logo_url_dark` values. - Removes path-backed interface assets, including `logoDark`, from uninstalled Git fallback listings until the plugin has a real local root. - Updates the bundled plugin validator and manifest reference. - Regenerates the app-server JSON schemas and TypeScript types. Local manifests expose `interface.logoDark` as a package-relative asset path. Remote catalog responses expose `logo_url_dark`. These values map into separate app-server fields so clients can preserve local-path and remote-URL handling. ## Risk The fields are additive and nullable, so existing clients retain their current logo behavior. The main risks are an incomplete mapping path or exposing a synthetic local path for an uninstalled Git plugin. Local-manifest, remote-catalog, fallback-listing, protocol serialization, and app-server integration tests cover those paths. Spiciness: 2/5 ## Testing - `just write-app-server-schema` - `just fmt` - Regression test first failed with `logo_dark` resolved to `/assets/logo-dark.png`, then passed after the fallback-listing fix. - `just test -p codex-core-plugins` (267 tests passed) - `just test -p codex-app-server 'suite::v2::plugin'` (114 tests passed) - `just test -p codex-app-server-protocol -p codex-core-plugins -p codex-plugin -p codex-skills` (517 tests passed before the follow-up) - `just test -p codex-tui plugin` (47 tests passed) - Validated a local plugin manifest containing `interface.logoDark` with the bundled validator. ## Manual verification Create a local plugin with both `interface.logo` and `interface.logoDark`, then call `plugin/list` or `plugin/read`. Confirm the response contains separate `logo` and `logoDark` paths. For a remote catalog entry, confirm `logoUrlDark` is populated from `logo_url_dark`. For an uninstalled Git marketplace entry, confirm path-backed interface assets remain absent until installation. Issue: N/A - coordinated maintainer change.
## Description This PR makes `thread.history_mode` immutable after the thread's canonical first `SessionMeta` has been written. Later same-thread `SessionMeta` lines are compatibility metadata writes, not a new thread definition. Without this, an older binary could append a `SessionMeta` that omits `history_mode`; when a newer binary replays it, serde defaults that missing field to `legacy` and SQLite could downgrade a paginated thread. ## Why `history_mode` is the persisted thread storage contract. Paginated-thread fail-closed behavior and SQLite memory filtering depend on it staying aligned with canonical rollout metadata, especially when multiple Codex binary versions can touch the same local rollout. ## What changed - Stop generic rollout metadata replay from overwriting `history_mode` from later `SessionMeta` items. - Remove `history_mode` from `ThreadMetadataPatch`, so mutable metadata sync and app-server metadata updates cannot rewrite it. - When local metadata sync has to recreate a missing SQLite row, recover `history_mode` from the rollout's canonical first `SessionMeta` instead of from a mutable patch. - Keep the in-memory thread store using the created thread's canonical `history_mode` instead of metadata patches. - Fill the one remaining core test `CreateThreadParams` initializer with the new `history_mode` field; Bazel CI caught this after the parent history-mode PR landed. ## Validation - `just fmt` - `just test -p codex-thread-store` - `just test -p codex-state session_meta_does_not_set_model_or_reasoning_effort`
## Description This adds stable optional `turnId` support to `thread/fork`. When supplied, the fork copies persisted history through that terminal turn, inclusive, and drops later turns from the new thread. Omitting or passing `null` preserves the existing full-history fork behavior, including the interruption marker when the stored source history ends mid-turn. ## Why We're deprecating `thread/rollback` and this will help certain UX use cases work around it by using `thread/fork` + `turn_id` instead.
## Why I use the `$code-review` skill a lot and it'd be nice to add my own additional review criteria in `$CODEX_HOME/skills/code-review-*`. ## What Removes phrasing about "code-review-* skills in this repository" which in practice seems like enough to get Codex to consult my user-level code review skills in addition to the repo-level ones.
## Summary - add Sol (`openai.gpt-5.6-sol`), Terra (`openai.gpt-5.6-terra`), and Luna (`openai.gpt-5.6-luna`) to the Amazon Bedrock static model catalog - derive all three entries from the bundled GPT-5.5 metadata and add the Bedrock-only `max` reasoning effort - keep the new entries below the current GPT-5.5 and GPT-5.4 models at priorities 2, 3, and 4, preserving GPT-5.5 as the default - add deep-equality coverage for inherited model configuration, catalog ordering, context windows, and service-tier behavior
### Summary Release live thread persistence when a session ends because its submission channel closes. This prevents a later same-process resume from failing with `thread ... already has a live local writer`. ### Details The issue is in the `codex-core` session teardown path used by Codex hosts, rather than in Managed Agents API or exec-server itself. Explicit shutdown already closes the `LiveThread`, which releases the process-scoped writer held by `LocalThreadStore`. The submission-channel-close fallback ran runtime and extension teardown but skipped that persistence shutdown, leaving the thread ID registered as having a live writer. This change: - closes the `LiveThread` on the channel-close fallback path; - preserves the existing teardown order used by explicit shutdowns; - extends the lifecycle regression test to assert that the thread store receives `shutdown_thread`. Context: [original report](https://openai.slack.com/archives/C0B4NBHQGTV/p1782136364948039), [recent occurrence 1](https://openai.slack.com/archives/C0B4NBHQGTV/p1782434817895839?thread_ts=1782136364.948039&cid=C0B4NBHQGTV), [recent occurrence 2](https://openai.slack.com/archives/C0B4NBHQGTV/p1782335107474429?thread_ts=1782136364.948039&cid=C0B4NBHQGTV) ### Testing - `just test -p codex-core submission_loop_channel_close_runs_full_thread_teardown` - `just test -p codex-core --lib` (1,989 passed; 3 skipped) - `just fix -p codex-core` - `just fmt` - Native code review: no findings I also attempted `just test -p codex-core`. The new regression passed; 79 unrelated integration tests failed in the local harness, primarily because helper binaries such as `test_stdio_server` were unavailable, plus local proxy/shell timing failures.
## Summary - classify authentication-required RMCP startup failures, including errors nested inside `ClientInitializeError::TransportError` - let `codex-mcp` consume that classification so the existing `reauthenticationRequired` startup failure reason is emitted - add a regression test that performs real startup with an expired persisted OAuth token and no refresh token ## Why Follow-up to #29877. RMCP stores streamable HTTP initialization failures inside a dynamic transport error whose payload is not exposed through the standard Rust error source chain. The original `anyhow::Error::chain()` check therefore missed the nested `AuthError::AuthorizationRequired` seen during real MCP startup and emitted `failureReason: null`. The transport-specific inspection now lives in `codex-rmcp-client`, while `codex-mcp` consumes only the domain-level authentication-required result. This classifier does not distinguish first-time login from reauthentication; the existing auth-state logic remains responsible for that distinction. ## User impact When stored MCP OAuth credentials are expired and cannot be refreshed, app clients now receive `failureReason: "reauthenticationRequired"` on the failed startup update and can show the reconnect action. First-time login and unrelated startup failures remain unchanged. ## Validation - `just test -p codex-rmcp-client --test streamable_http_oauth_startup identifies_expired_unrefreshable_token_startup_error` - `just test -p codex-mcp startup_outcome_error_identifies_authentication_required` - `just test -p codex-mcp mcp_startup_failure_reason_requires_existing_oauth_and_auth_failure` - `cargo build -p codex-cli --bin codex` - local app-server probe emitted `failureReason: "reauthenticationRequired"` - manual end-to-end reconnect flow confirmed - `just fmt`
## Why
Marketplace source deserialization treated `{"source":"npm", ...}` as
unsupported. The loader logged and skipped the entry, so npm-backed
plugins never appeared in `plugin list --available` and `plugin add`
returned "plugin not found".
Codex plugins are installed from a plugin root, not from an npm
dependency tree. For npm-backed marketplace entries, Codex should fetch
the published package contents without running package scripts or
installing unrelated dependencies.
## What changed
- Add `npm` marketplace plugin sources with `package`, optional semver
`version` or version range, and optional HTTPS `registry`.
- Reject unsafe npm source fields before materialization, including
invalid package names, non-semver version selectors, plaintext or
credential-bearing registry URLs, and registry query/fragment data.
- Materialize npm plugins with `npm pack --ignore-scripts`, then unpack
the resulting tarball through the existing hardened plugin bundle
extractor.
- Enforce npm archive and extracted-size limits, require the standard
npm `package/` archive root, and verify the extracted `package.json`
name matches the requested package before installing.
- Keep plugin listings, install-source descriptions, CLI JSON/human
output, app-server v2 `PluginSource`, TUI source summaries, regenerated
schema fixtures, and app-server documentation in sync.
## Impact
Marketplaces can distribute Codex plugins from public or configured
private HTTPS npm registries using the same install flow as existing
materialized plugin sources. `npm` must be available on `PATH` when an
npm-backed plugin is installed.
Fixes #27831
## Validation
- `just write-app-server-schema`
- `just test -p codex-core-plugins -p codex-app-server-protocol -p
codex-app-server -p codex-cli`
- npm/schema/core-plugin coverage passed in the run.
- The full focused command finished with `1739 passed`, `11 failed`, and
`6 timed out`; the failures were unrelated local app-server environment
failures from `sandbox-exec: sandbox_apply: Operation not permitted`
plus one missing `test_stdio_server` helper binary.
- Installed an npm-published Codex plugin package through a throwaway
local marketplace and throwaway `CODEX_HOME` to exercise the real npm
materialization path end to end.
## Why It's hard to change the set of required jobs when they're managed in the GitHub UI, and when each workflow is responsible for choosing it's own scheduling it's easy to end up with skew between what we enforce on PRs vs. on main. ## What - add a `blocking-ci` caller workflow, triggered by pull requests and pushes to `main`, for Bazel, blob size, cargo-deny, Codespell, `repo-checks`, rust CI, and SDK CI - add an `always()` terminal job named `CI required` that fails unless every called workflow succeeds - add a `postmerge-ci` caller workflow for `rust-ci-full` and `v8-canary`, with a terminal `Postmerge CI results` job - centralize V8 relevance detection in `v8_canary_changes.py`; unrelated PR and postmerge runs execute metadata only and skip the expensive build matrices - leave `v8-canary` outside the blocking gate and leave the external `cla` check independent ## Rollout A repository admin must replace the existing required GitHub Actions contexts with `CI required` in the main-branch ruleset. Retain `cla` as a separate required check. Until that change is coordinated, this PR cannot satisfy the old standalone check names. In-flight PRs will need to be rebased after this lands.
## Description
This PR adds canonical core `TurnItem` shapes for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity, to
be stored in the rollout file soon.
It also teaches app-server protocol / `ThreadHistoryBuilder` how to
render those items, and adds the small legacy fanout helpers needed for
existing event-based consumers. No core producer or rollout persistence
behavior changes here, that will be done in a followup.
## Making ThreadHistoryBuilder stateless
This is the first PR in a stack to make `ThreadHistoryBuilder` stateless
enough that we can materialize app-server `ThreadItem`s from only a
given slice of `RolloutItem` history, without ever needing to replay the
whole thread from the beginning.
The persisted legacy `RolloutItem::EventMsg` records are mostly shaped
like live UI events, not like materialized `ThreadItem`s. They work if
we replay the full rollout in order, but they often do not contain
enough stable identity or complete item state to project an arbitrary
suffix on its own.
A few examples:
- `UserMessageEvent` and `AgentMessageEvent` have content, but
historically do not carry the persisted app-server item ID that should
become the SQLite primary key.
- `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are
fragments. `ThreadHistoryBuilder` currently merges them into the last
reasoning item, which means a slice starting in the middle of reasoning
cannot know whether to append to an earlier item or create a new one.
- `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and
similar legacy events can often render a final-looking item, but they
usually rely on prior replay state to know which turn owns the item.
- Begin/end legacy events are partial views of one logical item. The
builder correlates them by `call_id` and mutates prior state to
synthesize the final `ThreadItem`.
That is the problem this direction fixes. A persisted canonical
lifecycle record looks much closer to the read model we actually want
later:
```rust
ItemCompletedEvent {
turn_id,
item: TurnItem { id, ...full snapshot... },
completed_at_ms,
}
```
Once rollout has explicit `turn_id`, stable `item.id`, and a canonical
completed item snapshot, the future SQLite projector can reduce only the
new rollout suffix and upsert the affected `thread_items` rows. It no
longer needs to synthesize `item-N`, infer item ownership from the
active turn, or replay earlier events just to reconstruct the current
item snapshot.
## What changed
- Added core `TurnItem` variants and item structs for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity.
- Added conversions from those canonical items back into the legacy
event shapes where current consumers still need them.
- Added app-server v2 `ThreadItem` conversion for the new core item
variants.
- Taught `ThreadHistoryBuilder` and rollout persistence metrics to
recognize the new item variants.
## Follow-up
The next PR #30283 switches the live
core producers for these item families onto canonical `ItemStarted` /
`ItemCompleted` events.
## Why Remote-control websocket reconnects and pairing requests proactively refresh their server token. When `/server/refresh` returns a transient error such as `502`, the still-valid token was discarded as a usable connection path, causing reconnect failures and repeated refresh attempts that could amplify an upstream incident. ## What Changed - Start proactive refresh five minutes before token expiry and distinguish it from a required refresh for missing or expired tokens. - Continue websocket and pairing operations with the existing valid token after `429`, `5xx`, or timeout failures. - Share an in-memory `next_refresh_at` throttle across websocket and pairing callers, honoring both `Retry-After` formats and otherwise using a jittered 24–36 second delay. - Keep required refreshes strict, preserve `404` enrollment replacement, and clear token/throttle state for `401` and `403` auth recovery. - Preserve refresh response metadata internally and add focused wire-level and integration coverage. ## Verification Added behavioral coverage proving that: - a valid near-expiry token still completes websocket and pairing requests after transient refresh failures; - `Retry-After` suppresses a subsequent refresh across websocket and pairing callers; - request and response-body timeouts are classified as transient; - an expired token, including one that expires during refresh, cannot proceed to websocket connection; - auth failures clear the attempted token without overwriting a concurrently rotated token.
## Summary - complete unified-exec processes from the ordered event stream instead of issuing a final zero-wait `process/read` - add optional executor sandbox-denial state to `process/exited` - retain `process/read` as a retained-output and compatibility fallback for receiver lag, sequence gaps, and legacy servers - recover sandbox-denial state across transport reconnection - cover the real `TestCodex` remote-exec path without adding a public test-only event constructor ## Why A successful one-shot tool call currently receives its output and terminal notifications, then pays another wide-area `process/read` round trip before returning. Staging traces showed that remote response wait accounted for more than 99.8% of RPC time; local serialization, queueing, and deserialization were below 0.6 ms. ## Measured impact A direct staging A/B used the same build and route and changed only completion mode. Each arm ran three times with 30 one-shot `/usr/bin/true` calls per run. The table reports the median of the three per-run percentiles. | Metric | Final `process/read` | Pushed events | Change | | --- | ---: | ---: | ---: | | End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) | | End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) | | Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) | | Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms | TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The successful, complete, in-order event path issued zero final `process/read` calls. ## Compatibility and recovery - new servers send `sandboxDenied` on `process/exited` - legacy servers omit it, which triggers one compatibility `process/read` - broadcast lag or a sequence gap triggers a retained-output read - recovery remains bounded by the server's existing 1 MiB retained-output window - complete, in-order event streams issue no completion read - sandbox denial is attached to the exit event before consumers can observe process completion - server-first and client-first rollouts remain wire-compatible; server-first realizes the latency win immediately ## Integration coverage The `TestCodex` suite exercises four distinct remote-exec contracts: - complete pushed output/exit/close with zero reads - direct pushed sandbox denial with zero reads - legacy missing denial metadata with exactly one compatibility read - count-bounded replay eviction recovered from retained output without duplication ## Validation - `just test -p codex-core exec_command_consumes_pushed_remote_process_events`: 4 passed - `just test -p codex-core unified_exec::process_tests::`: 4 passed - `just test -p codex-exec-server`: 294 passed, 2 skipped - `just test -p codex-exec-server-protocol`: 5 passed - `just test -p codex-rmcp-client`: 89 passed, 2 skipped - focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards - scoped `just fix` passed for core and exec-server - `just fmt` passed The complete workspace suite was not rerun; focused Cargo and Bazel coverage passed for the changed behavior.
## Why Remote diff-root discovery is independent of world-state construction, but it ran afterward and added filesystem metadata latency before the first model request. Overlap the independent work so thread-cold turns do not pay those waits serially. ## What - Run `record_context_updates_and_set_reference_context_item` and `turn_diff_display_roots` with `tokio::join!`. - Reuse the same resolved display roots when constructing `TurnDiffTracker`; no cache or behavior lifecycle changes are introduced. ## Validation A synthetic executor-skill benchmark with artificial network delay: thread-cold model-request p50 improved from about 1.79 s to 1.58 s.
## Why `LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the behavior was only covered indirectly. We should verify the actual JSONL written by both user-facing entry points: `codex app-server` and the standalone `codex-app-server` binary. The existing processor shutdown message also always said the channel closed, even though the processor can exit for several different reasons. Structured fields make that event more accurate and useful to log consumers. ## What changed - Record the processor `exit_reason`, remaining connection count, and forced-shutdown state as structured tracing fields. - Add a shared process-test helper that enables JSON logging, validates every stderr line as JSON, and verifies the top-level timestamp is RFC 3339. - Cover both `codex app-server` and `codex-app-server`, asserting the stable `level`, `fields`, and `target` payload. ## Test plan - `just test -p codex-app-server standalone_app_server_emits_json_info_events` - `just test -p codex-cli app_server_emits_json_info_events`
## Summary - Preserve the optional namespace on custom tool calls during response deserialization and app-server replay. - Use the namespaced tool identifier for streaming argument handling and tool dispatch. - Regenerate app-server protocol schemas. - Add regression tests covering namespace serialization and routing. ## Testing - Ran affected protocol and app-server test suites. - Ran the full core test suite; two load-sensitive timing tests passed when rerun individually. - Ran Clippy and formatting checks. - Verified with a local end-to-end app-server replay that the namespace is preserved through the complete request/response flow.
## Why Response item IDs represent stable conversation identity. `ContextManager::for_prompt` repairs an unmatched call by synthesizing an `"aborted"` output in the disposable prompt projection, but that output previously had no ID. Assigning a fresh ID on every prompt build would make retries and resumes change otherwise identical model context and reduce prompt-cache reuse. The concrete bug is that these normalization-created outputs bypass the regular item-ID allocation path. Even with item IDs enabled, a prompt could therefore contain an identified call paired with a synthetic output whose `id` was missing. This change closes that gap by deriving the output ID from the source call's item ID. For legacy calls that have no item ID, the output remains ID-less because there is no stable source identity to derive from. The originating call already has a stable item ID under the item-ID model introduced in #28814. A prompt-only output can therefore derive stable identity from that call without mutating canonical history or persisted rollouts. This addresses the failure exposed by #30311 while keeping normalization read-only outside its detached prompt snapshot. UUIDv5 is intentional here because it is the standard namespaced, deterministic UUID construction. Using the output kind and source call ID as the name produces the same UUID on every projection while keeping output kinds in separate name domains. UUIDv7 would introduce randomness and time, so keeping it stable would require persisting the synthetic repair. UUIDv5 uses SHA-1 internally, but this is only an identity mapping—not an authenticity or security boundary. ## What changed - Derive a deterministic UUIDv5 ID for each synthesized call output from the source call item ID. - Use the Responses API prefix appropriate for function, custom-tool, tool-search, and local-shell outputs. - Preserve the existing insertion position immediately after the unmatched call. - Keep synthesized outputs prompt-only; no rollout, task-lifecycle, compaction, or raw-response behavior changes. ## Testing - `just test -p codex-core for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history` - `just test -p codex-core synthetic_call_output_id_is_stable_across_resumes` - `just test -p codex-core normalize_adds_missing_output` - `just test -p codex-core response_item_ids`
## Why
App-server clients that configure named execution environments need to
discover an environment's shell and working directory before selecting
it for a thread or turn. Because the environment can run on a different
operating system than app-server, its working directory is represented
as a canonical `file:` URI rather than a host-local path string. The
probe also needs a bounded response time: an exec-server that completes
initialization but never answers `environment/info` must not hold the
environment serialization queue indefinitely.
## What changed
- Add an experimental `environment/info` app-server RPC for named
environments.
- Route the probe through the managed environment connection and return
target-native shell metadata plus the default working directory as a
`PathUri`.
- Return connection and protocol failures as JSON-RPC errors.
- Bound the exec-server probe response to 30 seconds and remove
timed-out calls from the pending-request table so later environment
mutations can proceed.
- Cover successful responses, omitted working directories, unknown
environments, connection failures, and pending-call cleanup.
## Protocol examples
Request:
```json
{
"id": 42,
"method": "environment/info",
"params": {
"environmentId": "remote-a"
}
}
```
Successful response:
```json
{
"id": 42,
"result": {
"shell": {
"name": "zsh",
"path": "/bin/zsh"
},
"cwd": "file:///workspace"
}
}
```
If the exec-server initializes but does not answer the probe within 30
seconds:
```json
{
"id": 42,
"error": {
"code": -32603,
"message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
}
}
```
## Testing
- App-server integration coverage for successful info (including omitted
`cwd`), unknown environments, and connection failures.
- Exec-server RPC coverage verifying a timed-out call is removed from
the pending-request table.
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
## Summary - project effective marketplace/plugin config through the enterprise source policy so blocked installed plugins become inactive - filter plugin list/read/discovery and CLI marketplace source/snapshot reporting using the same policy - enforce source admission for background marketplace cache refreshes - continue refreshing/upgrading independent marketplaces and plugins when one entry fails, returning per-entry errors - include policy-projected plugin state in cache and refresh keys so requirement changes invalidate stale results ## Stack This is PR 2 of 2 and is based on #29690. Review the admission model and source matcher in #29690 first; this PR contains only runtime enforcement. ## Test plan - `just test -p codex-core-plugins` (287 tests) - `just test -p codex-cli plugin_list_ignores_implicit_system_marketplace_roots_without_manifests` - `cargo check -p codex-cli -p codex-app-server --tests`
## Summary Increase the external currentTime/read request timeout from 5 seconds to 10 seconds. ## Validation - just fmt - Focused app-server test build was stopped to defer validation to CI.
## Summary - enable the remote plugin feature by default - promote the remote plugin feature from under development to stable - preserve the existing `features.remote_plugin` override for explicitly disabling it - keep legacy disabled-path coverage explicit in TUI and app-server tests ## Impact Remote plugin functionality is enabled by default for configurations that do not set the feature flag. The existing Codex backend authentication gate still applies. ## Validation - `just fmt` - `just test -p codex-features` - `just test -p codex-tui plugins_popup_remote_section_fallback_states_snapshot` - targeted `codex-app-server` plugin-list and skills-list tests - `git diff --check` The full TUI and app-server suites were also exercised locally. All remote-plugin-related coverage passed; unrelated local sandbox/test-binary failures remain outside this change.
## Why The safety-buffering prompt is a modal TUI view, but the normal successful-turn path only hid the running status indicator. If the turn completed while the prompt was open, the stale modal remained over the composer until the user dismissed it or another turn started. This aligns the TUI with the app behavior: keep the safety notice visible while the turn is active, then remove it when the turn becomes terminal. It also prevents the stale retry action from changing the model and reasoning effort for a future turn after the buffered turn has already completed. | New copy | |---| | <img width="1014" height="313" alt="CleanShot 2026-06-28 at 20 27 18" src="https://github.com/user-attachments/assets/f0f37359-5d77-442f-add2-9d1874bdc422" /> | ## What changed - Clear the active safety-buffering view and retry state when a turn completes successfully. - Update the retry-capable message to say “Hang tight or retry with a faster model”. - Extend the safety-buffering regression coverage to verify that the prompt remains visible after assistant output starts and disappears when the turn completes. - Update the TUI snapshot for the revised copy. This is a follow-up to #29919. ## How to Test 1. Start a TUI turn that receives `model/safetyBuffering/updated` with `showBufferingUi: true` and a `fasterModel`. 2. Confirm the prompt says “Hang tight or retry with a faster model”. 3. Let the turn continue and confirm the prompt remains visible while the turn is active. 4. Let the turn finish successfully and confirm the prompt disappears and the composer is restored without requiring an extra keypress. 5. Confirm a buffering update without a faster model still shows the shorter non-retry message. Targeted automated coverage: - `just test -p codex-tui safety_buffering` — 4 passed. - `just test -p codex-tui` — 2,951 passed; two unrelated Guardian feature-flag tests failed identically on `main` in this environment. The argument-comment lint was also audited manually. The workspace Bazel invocation was blocked by a missing external LLVM `compiler-rt` BUILD file, and the packaged per-crate fallback uses a nightly older than the current `sqlx` minimum Rust version.
## Summary - add a false-by-default `include_skills_usage_instructions` model metadata field - enable the field for the bundled `gpt-5.5` model metadata - consume the metadata in both core and extension skill rendering - remove hardcoded legacy-model matching and its marker plumbing
## Summary - restore the v1 clarification that requests for depth, research, or investigation do not authorize subagent spawning - restore guidance for keeping critical-path, urgent, tightly coupled, or difficult work local - update the focused v1 tool-search and spawn-description coverage ## Why PR #27919 simplified the v1 `spawn_agent` prompt by removing its delegation decision guidance. That left the authorization rule intact, but removed the instructions that constrained what should be delegated after spawning was authorized. Restore those guardrails while preserving later support for explicit delegation authorization from applicable AGENTS.md and skill instructions. Multi-agent v2 prompts are unchanged. ## User impact Models using the v1 multi-agent tool surface receive clearer guidance to delegate independent side work while keeping blocking work on the main rollout. ## Validation - `just fmt` - `git diff --check` - tests not run locally per repository guidance; CI will validate the focused coverage
## Why The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an opaque custom effort. That made the reasoning picker render it as lowercase `max` while known efforts use productized labels. Making `max` a known effort aligns catalog data, parsing, and UI presentation without changing the `max` wire value or persisted representation. ## What changed - Add first-class `ReasoningEffort::Max` parsing and serialization. - Use the typed effort in the Bedrock catalog and render it as `Max` in the TUI. - Preserve forward-compatible custom-effort coverage with a genuinely unknown `future` value. ### Before <img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM" src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364" /> ### After <img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM" src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17" />
## Summary Bio/Cyber safety surfaces in the TUI could send users to stale Trusted Access pages, and safety buffering did not always expose the Help Center. This follow-up to #30317 adds the missing Learn more action, refreshes the Bio access URL and block copy, and updates the affected snapshots while preserving the existing retry and wait behavior.
## Summary AWS Bedrock issues currently fall under broader labels, which makes provider-specific reports harder to find. The issue tracker now has an `aws-bedrock` label, but the automated labeler does not know to apply it. Teach the issue labeler to select `aws-bedrock` for Amazon Bedrock provider or Bedrock Mantle issues while excluding generic AWS references.
## Summary Disable Nagle unconditionally for both exec-server Rendezvous WebSocket connections. - pass `disable_nagle=true` at the executor and harness connection call sites - keep the existing signed URL, protocol, and connection flow unchanged - add no feature flag, rollout schema, path variant, or experiment-specific telemetry The companion internal PR enables `TCP_NODELAY` on accepted Rendezvous sockets: openai/openai#1082463 ## Why Rendezvous carries small, latency-sensitive relay and JSON-RPC frames. Three staging runs of 30 steady-state `process/read` calls per configuration measured p50 improving from 139.1 ms to 81.5 ms and p95 from 162.0 ms to 95.8 ms with Nagle disabled. The expected packet overhead is small at the current connection scale. We will use existing latency, error, packet, and CPU monitoring and revert normally if production regresses. ## Rollout and rollback The client and accepted-socket changes can deploy independently. New connections receive the setting as each side deploys. Rollback is a normal code revert; there is no persisted assignment or gate state to unwind. ## Validation - `just test -p codex-exec-server --lib`: 164 passed - `just fix -p codex-exec-server`: passed - `just fmt`: passed - independent final review found no actionable issue
## Summary The TUI biosafety block still included obsolete copy telling approved researchers they may be able to apply for Trusted Access. Remove that sentence and update the UI snapshot to match the approved wording.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )