diff --git a/.github/workflows/build_images.yaml b/.github/workflows/build_images.yaml
index aaf97fc88c..78a965efd0 100644
--- a/.github/workflows/build_images.yaml
+++ b/.github/workflows/build_images.yaml
@@ -55,6 +55,7 @@ jobs:
           cp ./LICENSE ./ci/docker/context/LICENSE
           cp ./VERSION ./ci/docker/context/VERSION
           cp ./thirdparty/THIRD_PARTY_LICENSES ./ci/docker/context/THIRD_PARTY_LICENSES
+          cp ./ci/docker/entrypoint.sh ./ci/docker/context/entrypoint.sh
       - name: Copy Commit SHA and commit time
         run: |
           git rev-parse HEAD > ./ci/docker/context/COMMIT_SHA
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 2835786ae4..8d03641fde 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -212,6 +212,17 @@ export RAPIDS_DATASET_ROOT_DIR=$CUOPT_HOME/datasets/
 cd $CUOPT_HOME/python
 pytest -v ${CUOPT_HOME}/python/cuopt/cuopt/tests
 ```
+## gRPC Remote Execution
+
+NVIDIA cuOpt includes a gRPC-based remote execution system for running solves on a
+GPU server from a program using the API locally. User documentation lives under `docs/cuopt/source/cuopt-grpc/` (Sphinx **gRPC remote execution** section):
+
+- `quick-start.rst` — Install/Docker/selector, how remote execution works, minimal LP and CLI examples (default C bundle).
+- `advanced.rst` — TLS, tuning, limitations, troubleshooting.
+- `examples.rst`, `api.rst` — Sample patterns and RPC overview.
+- `docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md` — Short **gRPC server behavior** page in user docs.
+- `cpp/docs/grpc-server-architecture.md` — Full contributor reference (IPC, C++ source map, streaming).
+
 ## Debugging cuOpt
 
 ### Building in debug mode from source
diff --git a/GRPC_INTERFACE.md b/GRPC_INTERFACE.md
deleted file mode 100644
index cdcffead97..0000000000
--- a/GRPC_INTERFACE.md
+++ /dev/null
@@ -1,392 +0,0 @@
-# gRPC Interface Architecture
-
-## Overview
-
-The cuOpt remote execution system uses gRPC for client-server communication. The interface
-supports arbitrarily large optimization problems (multi-GB) through a chunked array transfer
-protocol that uses only unary (request-response) RPCs — no bidirectional streaming.
-
-All client-server serialization uses protocol buffers generated by `protoc` and
-`grpc_cpp_plugin`. The internal server-to-worker pipe uses protobuf for metadata
-headers and raw byte transfer for bulk array data (see Security Notes).
-
-## Directory Layout
-
-All gRPC-related C++ source lives under a single tree:
-
-```
-cpp/src/grpc/
-├── cuopt_remote.proto              # Base protobuf messages (job status, settings, etc.)
-├── cuopt_remote_service.proto      # Service definition + messages (SubmitJob, ChunkedUpload, Incumbent, etc.)
-├── grpc_problem_mapper.{hpp,cpp}   # CPU problem ↔ proto (incl. chunked header)
-├── grpc_solution_mapper.{hpp,cpp}  # LP/MIP solution ↔ proto (unary + chunked)
-├── grpc_settings_mapper.{hpp,cpp}  # PDLP/MIP settings ↔ proto
-├── grpc_service_mapper.{hpp,cpp}   # Request/response builders (status, cancel, stream logs, etc.)
-├── client/
-│   ├── grpc_client.{hpp,cpp}       # High-level client: connect, submit, poll, get result
-│   └── solve_remote.cpp            # solve_lp_remote / solve_mip_remote (uses grpc_client)
-└── server/
-    ├── grpc_server_main.cpp        # main(), argument parsing, gRPC server setup
-    ├── grpc_service_impl.cpp       # CuOptRemoteServiceImpl — all RPC handlers
-    ├── grpc_server_types.hpp       # Shared types, globals, forward declarations
-    ├── grpc_field_element_size.hpp # ArrayFieldId → element byte size (codegen target)
-    ├── grpc_pipe_serialization.hpp # Pipe I/O: protobuf headers + raw byte arrays (request/result)
-    ├── grpc_incumbent_proto.hpp    # Incumbent proto build/parse (codegen target)
-    ├── grpc_worker.cpp             # worker_process(), incumbent callback, store_simple_result
-    ├── grpc_worker_infra.cpp       # Pipes, spawn, wait_for_workers, mark_worker_jobs_failed
-    ├── grpc_server_threads.cpp     # result_retrieval, incumbent_retrieval, session_reaper
-    └── grpc_job_management.cpp     # Pipe I/O, submit_job_async, check_status, cancel, etc.
-```
-
-- **Protos**: Live in `cpp/src/grpc/`. CMake generates C++ in the build dir (`cuopt_remote.pb.h`, `cuopt_remote_service.pb.h`, `cuopt_remote_service.grpc.pb.h`).
-- **Mappers**: Shared by client and server; convert between host C++ types and protobuf. Used for unary and chunked paths.
-- **Client**: Solver-level utility (not public API). Used by `solve_lp_remote`/`solve_mip_remote` and tests.
-- **Server**: Standalone executable `cuopt_grpc_server`. See `GRPC_SERVER_ARCHITECTURE.md` for process model and file roles.
-
-## Protocol Files
-
-| File | Purpose |
-|------|---------|
-| `cpp/src/grpc/cuopt_remote.proto` | Message definitions (problems, settings, solutions, field IDs) |
-| `cpp/src/grpc/cuopt_remote_service.proto` | gRPC service definition (RPCs) |
-
-Generated code is placed in the CMake build directory (not checked into source).
-
-## Service Interface
-
-```protobuf
-service CuOptRemoteService {
-  // Job submission (small problems, single message)
-  rpc SubmitJob(SubmitJobRequest) returns (SubmitJobResponse);
-
-  // Chunked upload (large problems, multiple unary RPCs)
-  rpc StartChunkedUpload(StartChunkedUploadRequest) returns (StartChunkedUploadResponse);
-  rpc SendArrayChunk(SendArrayChunkRequest) returns (SendArrayChunkResponse);
-  rpc FinishChunkedUpload(FinishChunkedUploadRequest) returns (SubmitJobResponse);
-
-  // Job management
-  rpc CheckStatus(StatusRequest) returns (StatusResponse);
-  rpc CancelJob(CancelRequest) returns (CancelResponse);
-  rpc DeleteResult(DeleteRequest) returns (DeleteResponse);
-
-  // Result retrieval (small results, single message)
-  rpc GetResult(GetResultRequest) returns (ResultResponse);
-
-  // Chunked download (large results, multiple unary RPCs)
-  rpc StartChunkedDownload(StartChunkedDownloadRequest) returns (StartChunkedDownloadResponse);
-  rpc GetResultChunk(GetResultChunkRequest) returns (GetResultChunkResponse);
-  rpc FinishChunkedDownload(FinishChunkedDownloadRequest) returns (FinishChunkedDownloadResponse);
-
-  // Blocking wait (returns status only, use GetResult afterward)
-  rpc WaitForCompletion(WaitRequest) returns (WaitResponse);
-
-  // Real-time streaming
-  rpc StreamLogs(StreamLogsRequest) returns (stream LogMessage);
-  rpc GetIncumbents(IncumbentRequest) returns (IncumbentResponse);
-}
-```
-
-## Chunked Array Transfer Protocol
-
-### Why Chunking?
-
-gRPC has per-message size limits (configurable, default set to 256 MiB in cuOpt), and
-protobuf has a hard 2 GB serialization limit. Optimization problems and their solutions
-can exceed several gigabytes, so a chunked transfer mechanism is needed.
-
-The protocol uses only **unary RPCs** (no bidirectional streaming), which simplifies
-error handling, load balancing, and proxy compatibility.
-
-### Upload Protocol (Large Problems)
-
-When the estimated serialized problem size exceeds 75% of `max_message_bytes`, the client
-splits large arrays into chunks and sends them via multiple unary RPCs:
-
-```
-Client                                          Server
-  |                                               |
-  |-- StartChunkedUpload(header, settings) -----> |
-  |<-- upload_id, max_message_bytes -------------- |
-  |                                               |
-  |-- SendArrayChunk(upload_id, field, data) ----> |
-  |<-- ok ---------------------------------------- |
-  |                                               |
-  |-- SendArrayChunk(upload_id, field, data) ----> |
-  |<-- ok ---------------------------------------- |
-  |           ...                                 |
-  |                                               |
-  |-- FinishChunkedUpload(upload_id) ------------> |
-  |<-- job_id ------------------------------------ |
-```
-
-**Key features:**
-- `StartChunkedUpload` sends a `ChunkedProblemHeader` with all scalar fields and
-  array metadata (`ArrayDescriptor` for each large array: field ID, total elements,
-  element size)
-- Each `SendArrayChunk` carries one chunk of one array, identified by `ArrayFieldId`
-  and `element_offset`
-- The server reports `max_message_bytes` so the client can adapt chunk sizing
-- `FinishChunkedUpload` triggers server-side reassembly and job submission
-
-### Download Protocol (Large Results)
-
-When the result exceeds the gRPC max message size, the client fetches it via
-chunked unary RPCs (mirrors the upload pattern):
-
-```
-Client                                           Server
-  |                                                |
-  |-- StartChunkedDownload(job_id) --------------> |
-  |<-- download_id, ChunkedResultHeader ---------- |
-  |                                                |
-  |-- GetResultChunk(download_id, field, off) ----> |
-  |<-- data bytes --------------------------------- |
-  |                                                |
-  |-- GetResultChunk(download_id, field, off) ----> |
-  |<-- data bytes --------------------------------- |
-  |           ...                                  |
-  |                                                |
-  |-- FinishChunkedDownload(download_id) ---------> |
-  |<-- ok ----------------------------------------- |
-```
-
-**Key features:**
-- `ChunkedResultHeader` carries all scalar fields (termination status, objectives,
-  residuals, solve time, warm start scalars) plus `ResultArrayDescriptor` entries
-  for each array (solution vectors, warm start arrays)
-- Each `GetResultChunk` fetches a slice of one array, identified by `ResultFieldId`
-  and `element_offset`
-- `FinishChunkedDownload` releases the server-side download session state
-- LP results include PDLP warm start data (9 arrays + 8 scalars) for subsequent
-  warm-started solves
-
-### Automatic Routing
-
-The client handles size-based routing transparently:
-
-1. **Upload**: Estimate serialized problem size
-   - Below 75% of `max_message_bytes` → unary `SubmitJob`
-   - Above threshold → `StartChunkedUpload` + `SendArrayChunk` + `FinishChunkedUpload`
-2. **Download**: Check `result_size_bytes` from `CheckStatus`
-   - Below `max_message_bytes` → unary `GetResult`
-   - Above limit (or `RESOURCE_EXHAUSTED`) → chunked download RPCs
-
-## Error Handling
-
-### gRPC Status Codes
-
-| Code | Meaning | Client Action |
-|------|---------|---------------|
-| `OK` | Success | Process result |
-| `NOT_FOUND` | Job ID not found | Check job ID |
-| `RESOURCE_EXHAUSTED` | Message too large | Use chunked transfer |
-| `CANCELLED` | Job was cancelled | Handle gracefully |
-| `DEADLINE_EXCEEDED` | Timeout | Retry or increase timeout |
-| `UNAVAILABLE` | Server not reachable | Retry with backoff |
-| `INTERNAL` | Server error | Report to user |
-| `INVALID_ARGUMENT` | Bad request | Fix request |
-
-### Connection Handling
-
-- Client detects `context->IsCancelled()` for graceful disconnect
-- Server cleans up job state on client disconnect during upload
-- Automatic reconnection is NOT built-in (caller should retry)
-
-## Completion Strategy
-
-The `solve_lp` and `solve_mip` methods poll `CheckStatus` every `poll_interval_ms`
-until the job reaches a terminal state (COMPLETED/FAILED/CANCELLED) or `timeout_seconds`
-is exceeded. During polling, MIP incumbent callbacks are invoked on the main thread.
-
-The `WaitForCompletion` RPC is available as a public async API primitive for callers
-managing jobs directly, but it is not used by the convenience `solve_*` methods because
-polling provides timeout protection and enables incumbent callbacks.
-
-## Client API (`grpc_client_t`)
-
-### Configuration
-
-```cpp
-struct grpc_client_config_t {
-  std::string server_address = "localhost:8765";
-  int poll_interval_ms       = 1000;
-  int timeout_seconds        = 3600;  // Max wait for job completion (1 hour)
-  bool stream_logs           = false; // Stream solver logs from server
-
-  // Callbacks
-  std::function<void(const std::string&)> log_callback;
-  std::function<void(const std::string&)> debug_log_callback;  // Internal client debug messages
-  std::function<bool(int64_t, double, const std::vector<double>&)> incumbent_callback;
-  int incumbent_poll_interval_ms = 1000;
-
-  // TLS configuration
-  bool enable_tls = false;
-  std::string tls_root_certs;   // CA certificate (PEM)
-  std::string tls_client_cert;  // Client certificate (mTLS)
-  std::string tls_client_key;   // Client private key (mTLS)
-
-  // Transfer configuration
-  int64_t max_message_bytes = 256 * 1024 * 1024;  // 256 MiB
-  int64_t chunk_size_bytes  = 16 * 1024 * 1024;   // 16 MiB per chunk
-  // Chunked upload threshold is computed as 75% of max_message_bytes.
-  bool enable_transfer_hash = false;               // FNV-1a hash logging
-};
-```
-
-### Synchronous Operations
-
-```cpp
-// Blocking solve — handles chunked transfer automatically
-auto result = client.solve_lp(problem, settings);
-auto result = client.solve_mip(problem, settings, enable_incumbents);
-```
-
-### Asynchronous Operations
-
-```cpp
-// Submit and get job ID
-auto submit = client.submit_lp(problem, settings);
-std::string job_id = submit.job_id;
-
-// Poll for status
-auto status = client.check_status(job_id);
-
-// Get result when ready
-auto result = client.get_lp_result<int, double>(job_id);
-
-// Cancel or delete
-client.cancel_job(job_id);
-client.delete_job(job_id);
-```
-
-### Real-Time Streaming
-
-```cpp
-// Log streaming (callback-based)
-client.stream_logs(job_id, 0, [](const std::string& line, bool done) {
-  std::cout << line;
-  return true;  // continue streaming
-});
-
-// Incumbent polling (during MIP solve)
-config.incumbent_callback = [](int64_t idx, double obj, const auto& sol) {
-  std::cout << "Incumbent " << idx << ": " << obj << "\n";
-  return true;  // return false to cancel solve
-};
-```
-
-## Environment Variables
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `CUOPT_REMOTE_HOST` | `localhost` | Server hostname for remote solves |
-| `CUOPT_REMOTE_PORT` | `8765` | Server port for remote solves |
-| `CUOPT_CHUNK_SIZE` | 16 MiB | Override `chunk_size_bytes` |
-| `CUOPT_MAX_MESSAGE_BYTES` | 256 MiB | Override `max_message_bytes` |
-| `CUOPT_GRPC_DEBUG` | `0` | Enable client debug/throughput logging (`0` or `1`) |
-| `CUOPT_TLS_ENABLED` | `0` | Enable TLS for client connections (`0` or `1`) |
-| `CUOPT_TLS_ROOT_CERT` | *(none)* | Path to PEM root CA file (server verification) |
-| `CUOPT_TLS_CLIENT_CERT` | *(none)* | Path to PEM client certificate file (for mTLS) |
-| `CUOPT_TLS_CLIENT_KEY` | *(none)* | Path to PEM client private key file (for mTLS) |
-
-## TLS Configuration
-
-### Server-Side TLS
-
-```bash
-./cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key
-```
-
-### Mutual TLS (mTLS)
-
-Server requires client certificate:
-
-```bash
-./cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key \
-  --tls-root ca.crt \
-  --require-client-cert
-```
-
-Client provides certificate via environment variables (applies to Python, `cuopt_cli`, and C API):
-
-```bash
-export CUOPT_TLS_ENABLED=1
-export CUOPT_TLS_ROOT_CERT=ca.crt
-export CUOPT_TLS_CLIENT_CERT=client.crt
-export CUOPT_TLS_CLIENT_KEY=client.key
-```
-
-Or programmatically via `grpc_client_config_t`:
-
-```cpp
-config.enable_tls = true;
-config.tls_root_certs = read_file("ca.crt");
-config.tls_client_cert = read_file("client.crt");
-config.tls_client_key = read_file("client.key");
-```
-
-## Message Size Limits
-
-| Configuration | Default | Notes |
-|---------------|---------|-------|
-| Server `--max-message-mb` | 256 MiB | Per-message limit (also `--max-message-bytes` for exact byte values) |
-| Server clamping | [4 KiB, ~2 GiB] | Enforced at startup to stay within protobuf's serialization limit |
-| Client `max_message_bytes` | 256 MiB | Clamped to [4 MiB, ~2 GiB] at construction |
-| Chunk size | 16 MiB | Payload per `SendArrayChunk`/`GetResultChunk` |
-| Chunked threshold | 75% of max_message_bytes | Problems above this use chunked upload (e.g. 192 MiB when max is 256 MiB) |
-
-Chunked transfer allows unlimited total payload size; only individual
-chunks must fit within the per-message limit. Neither client nor server
-allows "unlimited" message size — both clamp to the protobuf 2 GiB ceiling.
-
-## Security Notes
-
-1. **gRPC Layer**: All client-server message parsing uses protobuf-generated code
-2. **Internal Pipe**: The server-to-worker pipe uses protobuf for metadata headers
-   and length-prefixed raw `read()`/`write()` for bulk array data. This pipe is
-   internal to the server process (main → forked worker) and not exposed to clients.
-3. **Standard gRPC Security**: HTTP/2 framing, flow control, standard status codes
-4. **TLS Support**: Optional encryption with mutual authentication
-5. **Input Validation**: Server validates all incoming gRPC messages before processing
-
-## Data Flow Summary
-
-```
-┌─────────┐                                    ┌─────────────┐
-│ Client  │                                    │   Server    │
-│         │  SubmitJob (small)                 │             │
-│ problem ├───────────────────────────────────►│ deserialize │
-│         │  -or- Chunked Upload (large)       │      ↓      │
-│         │                                    │   worker    │
-│         │                                    │   process   │
-│         │  GetResult (small)                 │      ↓      │
-│ solution│◄───────────────────────────────────┤  serialize  │
-│         │  -or- Chunked Download (large)     │             │
-└─────────┘                                    └─────────────┘
-```
-
-See `GRPC_SERVER_ARCHITECTURE.md` for details on internal server architecture.
-
-## Code Generation
-
-The `cpp/codegen` directory (optional) generates conversion snippets from `field_registry.yaml`. Targets include:
-
-- **Settings**: PDLP/MIP settings ↔ proto (replacing hand-written blocks in the settings mapper).
-- **Result header/scalars/arrays**: ChunkedResultHeader and array field handling.
-- **Field element size**: `grpc_field_element_size.hpp` (ArrayFieldId → byte size).
-- **Incumbent**: `grpc_incumbent_proto.hpp` (build/parse `Incumbent` messages).
-
-Adding or changing a proto field can be done via YAML and regenerate instead of editing mapper code by hand.
-
-## Build
-
-- **libcuopt**: Includes the mapper `.cpp` files, `grpc_client.cpp`, and `solve_remote.cpp`. Requires `CUOPT_ENABLE_GRPC`, gRPC, and protobuf. Proto generation is done by CMake custom commands that depend on the `.proto` files in `cpp/src/grpc/`.
-- **cuopt_grpc_server**: Executable built from `cpp/src/grpc/server/*.cpp`; links libcuopt, gRPC, protobuf.
-
-Tests that use the client (e.g. `grpc_client_test.cpp`, `grpc_integration_test.cpp`) get `cpp/src/grpc` and `cpp/src/grpc/client` in their include path.
diff --git a/GRPC_QUICK_START.md b/GRPC_QUICK_START.md
deleted file mode 100644
index a3864c101e..0000000000
--- a/GRPC_QUICK_START.md
+++ /dev/null
@@ -1,248 +0,0 @@
-# cuOpt gRPC Remote Execution Quick Start
-
-This guide shows how to start the cuOpt gRPC server and solve
-optimization problems remotely from Python, `cuopt_cli`, or the C API.
-
-All three interfaces use the same environment variables for remote
-configuration. Once the env vars are set, your code works exactly the
-same as a local solve &mdash; no API changes required.
-
-## Prerequisites
-
-- A host with an NVIDIA GPU and cuOpt installed (server side).
-- cuOpt client libraries installed on the client host (can be CPU-only).
-- `cuopt_grpc_server` binary available (ships with the cuOpt package).
-
-## 1. Start the Server
-
-### Basic (no TLS)
-
-```bash
-cuopt_grpc_server --port 8765 --workers 1
-```
-
-### TLS (server authentication)
-
-```bash
-cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key
-```
-
-### mTLS (mutual authentication)
-
-```bash
-cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key \
-  --tls-root ca.crt \
-  --require-client-cert
-```
-
-See `GRPC_SERVER_ARCHITECTURE.md` for the full set of server flags.
-
-### How mTLS Works
-
-With mTLS the server verifies every client, and the client verifies the
-server. The trust model is based on Certificate Authorities (CAs), not
-individual certificates:
-
-- **`--tls-root ca.crt`** tells the server which CA to trust. Any client
-  presenting a certificate signed by this CA is accepted. The server
-  never sees or stores individual client certificates.
-- **`--require-client-cert`** makes client verification mandatory. Without
-  it the server requests a client cert but still allows unauthenticated
-  connections.
-- On the client side, `CUOPT_TLS_ROOT_CERT` is the CA that signed the
-  *server* certificate, so the client can verify the server's identity.
-
-### Restricting Access with a Custom CA
-
-To limit which clients can reach your server, create a private CA and
-only issue client certificates to authorized users. Anyone without a
-certificate signed by your CA is rejected at the TLS handshake before
-any solver traffic is exchanged.
-
-**1. Create a private CA (one-time setup):**
-
-```bash
-# Generate CA private key
-openssl genrsa -out ca.key 4096
-
-# Generate self-signed CA certificate (valid 10 years)
-openssl req -new -x509 -key ca.key -sha256 -days 3650 \
-  -subj "/CN=cuopt-internal-ca" -out ca.crt
-```
-
-**2. Issue a client certificate:**
-
-```bash
-# Generate client key
-openssl genrsa -out client.key 2048
-
-# Create a certificate signing request
-openssl req -new -key client.key \
-  -subj "/CN=team-member-alice" -out client.csr
-
-# Sign with your CA
-openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key \
-  -CAcreateserial -days 365 -sha256 -out client.crt
-```
-
-Repeat step 2 for each authorized client. Keep `ca.key` private;
-distribute only `ca.crt` (to the server) and the per-client
-`client.crt` + `client.key` pairs.
-
-**3. Issue a server certificate (signed by the same CA):**
-
-```bash
-# Generate server key
-openssl genrsa -out server.key 2048
-
-# Create CSR with subjectAltName matching the hostname clients will use
-openssl req -new -key server.key \
-  -subj "/CN=server.example.com" -out server.csr
-
-# Write a SAN extension file (DNS and/or IP must match client's target)
-cat > server.ext <<EOF
-subjectAltName=DNS:server.example.com,DNS:localhost,IP:127.0.0.1
-EOF
-
-# Sign with your CA
-openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
-  -CAcreateserial -days 365 -sha256 -extfile server.ext -out server.crt
-```
-
-> **Note:** `server.crt` must be signed by the same CA distributed to
-> clients, and its `subjectAltName` must match the hostname or IP that
-> clients connect to. gRPC (BoringSSL) requires SAN — `CN` alone is
-> not sufficient for hostname verification.
-
-**4. Start the server with your CA:**
-
-```bash
-cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key \
-  --tls-root ca.crt \
-  --require-client-cert
-```
-
-**5. Configure an authorized client:**
-
-```bash
-export CUOPT_REMOTE_HOST=server.example.com
-export CUOPT_REMOTE_PORT=8765
-export CUOPT_TLS_ENABLED=1
-export CUOPT_TLS_ROOT_CERT=ca.crt          # verifies the server
-export CUOPT_TLS_CLIENT_CERT=client.crt    # proves client identity
-export CUOPT_TLS_CLIENT_KEY=client.key
-```
-
-**Revoking access:** gRPC's built-in TLS does not support Certificate
-Revocation Lists (CRL) or OCSP. To revoke a client, either stop issuing
-new certs from the compromised CA and rotate to a new one, or deploy a
-reverse proxy (e.g., Envoy) in front of the server that supports CRL
-checking.
-
-## 2. Configure the Client (All Interfaces)
-
-Set these environment variables before running any cuOpt client.
-They apply identically to the Python API, `cuopt_cli`, and the C API.
-
-### Required
-
-```bash
-export CUOPT_REMOTE_HOST=<server-hostname>
-export CUOPT_REMOTE_PORT=8765
-```
-
-When both `CUOPT_REMOTE_HOST` and `CUOPT_REMOTE_PORT` are set, every
-call to `solve_lp` / `solve_mip` is transparently forwarded to the
-remote server. No code changes are needed.
-
-### TLS (optional)
-
-```bash
-export CUOPT_TLS_ENABLED=1
-export CUOPT_TLS_ROOT_CERT=ca.crt               # verify server certificate
-```
-
-For mTLS, also provide the client identity:
-
-```bash
-export CUOPT_TLS_CLIENT_CERT=client.crt
-export CUOPT_TLS_CLIENT_KEY=client.key
-```
-
-### Tuning (optional)
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `CUOPT_CHUNK_SIZE` | 16 MiB | Bytes per chunk for large problem transfer |
-| `CUOPT_MAX_MESSAGE_BYTES` | 256 MiB | Client-side gRPC max message size |
-| `CUOPT_GRPC_DEBUG` | `0` | Enable debug / throughput logging (`1` to enable) |
-
-## 3. Usage Examples
-
-Once the env vars are set, write your solver code exactly as you would
-for a local solve. The remote transport is handled automatically.
-
-### Python
-
-```python
-import cuopt_mps_parser
-from cuopt import linear_programming
-
-# Parse an MPS file
-dm = cuopt_mps_parser.ParseMps("model.mps")
-
-# Solve (routed to remote server via env vars)
-solution = linear_programming.Solve(dm, linear_programming.SolverSettings())
-
-print("Objective:", solution.get_primal_objective())
-print("Primal:   ", solution.get_primal_solution()[:5], "...")
-```
-
-### cuopt_cli
-
-```bash
-cuopt_cli model.mps
-```
-
-With solver options:
-
-```bash
-cuopt_cli model.mps --time-limit 30 --relaxation
-```
-
-### C++ API
-
-```cpp
-#include <cuopt/linear_programming/solve.hpp>
-#include <cuopt/linear_programming/cpu_optimization_problem.hpp>
-
-// Build problem using cpu_optimization_problem_t ...
-auto solution = cuopt::linear_programming::solve_lp(cpu_problem, settings);
-```
-
-The same `solve_lp` / `solve_mip` functions automatically detect the
-`CUOPT_REMOTE_HOST` / `CUOPT_REMOTE_PORT` env vars and forward to the
-gRPC server when they are set.
-
-## Troubleshooting
-
-| Symptom | Check |
-|---------|-------|
-| Connection refused | Verify the server is running and the host/port are correct. |
-| TLS handshake failure | Ensure `CUOPT_TLS_ENABLED=1` is set and certificate paths are correct. |
-| `Cannot open TLS file: ...` | The path in the TLS env var does not exist or is not readable. |
-| Timeout on large problems | Increase the solver `time_limit` or the client `timeout_seconds`. |
-
-## Further Reading
-
-- `GRPC_INTERFACE.md` &mdash; Protocol details, chunked transfer, client config, message sizes.
-- `GRPC_SERVER_ARCHITECTURE.md` &mdash; Server process model, IPC, threads, job lifecycle.
diff --git a/ci/docker/Dockerfile b/ci/docker/Dockerfile
index 1d49a4c04a..6df4159d81 100644
--- a/ci/docker/Dockerfile
+++ b/ci/docker/Dockerfile
@@ -45,6 +45,7 @@ RUN ln -sf /usr/bin/python${PYTHON_SHORT_VER} /usr/bin/python
 
 FROM python-env AS install-env
 
+ARG CUDA_VER
 ARG CUOPT_VER
 ARG PYTHON_SHORT_VER
 
@@ -68,36 +69,18 @@ FROM install-env AS cuopt-final
 
 ARG PYTHON_SHORT_VER
 
-# Consolidate all directory creation, permissions, and file operations into a single layer
+# Make cuopt_grpc_server, cuopt_cli, and shared libraries available to all processes
+# (profile.d scripts are only sourced by login shells; ENV works for all containers)
+ENV PATH="/usr/local/cuda/bin:/usr/bin:/usr/local/bin:/usr/local/nvidia/bin/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/bin:${PATH}"
+ENV LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/wsl/lib:/usr/lib/wsl/lib/libnvidia-container:/usr/lib/nvidia:/usr/lib/nvidia-current:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/lib/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/rapids_logger/lib64:${LD_LIBRARY_PATH}"
+
+# Directory creation, permissions
 RUN mkdir -p /opt/cuopt && \
     chmod 777 /opt/cuopt && \
-    # Create profile.d script for universal access
-    echo '#!/bin/bash' > /etc/profile.d/cuopt.sh && \
-    echo 'export PATH="/usr/local/cuda/bin:/usr/bin:/usr/local/bin:/usr/local/nvidia/bin/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/bin:$PATH"' >> /etc/profile.d/cuopt.sh && \
-    echo 'export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/wsl/lib:/usr/lib/wsl/lib/libnvidia-container:/usr/lib/nvidia:/usr/lib/nvidia-current:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/lib/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/rapids_logger/lib64:${LD_LIBRARY_PATH}"' >> /etc/profile.d/cuopt.sh && \
-    chmod +x /etc/profile.d/cuopt.sh && \
-    # Set in /etc/environment for system-wide access
-    echo 'PATH="/usr/local/cuda/bin:/usr/bin:/usr/local/bin:/usr/local/nvidia/bin/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/bin:$PATH"' >> /etc/environment && \
-    echo 'LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/wsl/lib:/usr/lib/wsl/lib/libnvidia-container:/usr/lib/nvidia:/usr/lib/nvidia-current:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/lib/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/rapids_logger/lib64:${LD_LIBRARY_PATH}"' >> /etc/environment && \
-    # Set proper permissions for cuOpt installation
     chmod -R 755 /usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/cuopt* && \
     chmod -R 755 /usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt* && \
     chmod -R 755 /usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/cuopt_* && \
-    chmod -R 755 /usr/local/bin/* && \
-    # Create entrypoint script in a single operation
-    echo '#!/bin/bash' > /opt/cuopt/entrypoint.sh && \
-    echo 'set -e' >> /opt/cuopt/entrypoint.sh && \
-    echo '' >> /opt/cuopt/entrypoint.sh && \
-    echo '# Get current user info from Docker environment variables' >> /opt/cuopt/entrypoint.sh && \
-    echo 'CURRENT_UID=${UID:-1000}' >> /opt/cuopt/entrypoint.sh && \
-    echo 'CURRENT_GID=${GID:-1000}' >> /opt/cuopt/entrypoint.sh && \
-    echo '' >> /opt/cuopt/entrypoint.sh && \
-    echo '# Set environment variables for the current user' >> /opt/cuopt/entrypoint.sh && \
-    echo 'export HOME="/opt/cuopt"' >> /opt/cuopt/entrypoint.sh && \
-    echo '' >> /opt/cuopt/entrypoint.sh && \
-    echo '# Execute the command' >> /opt/cuopt/entrypoint.sh && \
-    echo 'exec "$@"' >> /opt/cuopt/entrypoint.sh && \
-    chmod +x /opt/cuopt/entrypoint.sh
+    chmod -R 755 /usr/local/bin/*
 
 # Set the default working directory to the cuopt folder
 WORKDIR /opt/cuopt
@@ -112,6 +95,10 @@ COPY --from=cuda-libs /usr/local/cuda/lib64/libnvJitLink* /usr/local/cuda/lib64/
 # Copy CUDA headers needed for runtime compilation (e.g., CuPy NVRTC).
 COPY --from=cuda-headers /usr/local/cuda/include/ /usr/local/cuda/include/
 
-# Use the flexible entrypoint
+# Entrypoint supports server selection:
+#   Default:                  Python REST server
+#   CUOPT_SERVER_TYPE=grpc:   gRPC server (uses CUOPT_SERVER_PORT, CUOPT_GPU_COUNT)
+#   Explicit command:         docker run <image> cuopt_grpc_server [args...]
+COPY ./entrypoint.sh /opt/cuopt/entrypoint.sh
 ENTRYPOINT ["/opt/cuopt/entrypoint.sh"]
 CMD ["python", "-m", "cuopt_server.cuopt_service"]
diff --git a/ci/docker/entrypoint.sh b/ci/docker/entrypoint.sh
new file mode 100755
index 0000000000..3ee22dd086
--- /dev/null
+++ b/ci/docker/entrypoint.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Entrypoint for the cuOpt container image.
+#
+# Server selection (in order of precedence):
+#   1. Explicit command: docker run <image> cuopt_grpc_server [args...]
+#   2. Environment variable: CUOPT_SERVER_TYPE=grpc
+#   3. Default: Python REST server (cuopt_server.cuopt_service)
+#
+# When CUOPT_SERVER_TYPE=grpc, the following env vars configure the gRPC server:
+#   CUOPT_SERVER_PORT  — listen port       (default: 5001)
+#   CUOPT_GPU_COUNT    — worker processes  (default: 1)
+#   CUOPT_GRPC_ARGS    — additional CLI flags passed verbatim
+#                        (e.g. "--tls --tls-cert server.crt --log-to-console")
+#                        See docs/cuopt/source/cuopt-grpc/advanced.rst (flags/env);
+#                        cpp/docs/grpc-server-architecture.md for contributor IPC details.
+#                        for all available flags.
+
+set -e
+
+export HOME="/opt/cuopt"
+
+# If CUOPT_SERVER_TYPE=grpc, build a command line from env vars and launch.
+if [ "${CUOPT_SERVER_TYPE}" = "grpc" ]; then
+    GRPC_CMD=(cuopt_grpc_server)
+
+    GRPC_CMD+=(--port "${CUOPT_SERVER_PORT:-5001}")
+
+    if [ -n "${CUOPT_GPU_COUNT}" ]; then
+        GRPC_CMD+=(--workers "${CUOPT_GPU_COUNT}")
+    fi
+
+    # Allow arbitrary extra flags (e.g. --tls, --log-to-console)
+    if [ -n "${CUOPT_GRPC_ARGS}" ]; then
+        read -ra EXTRA <<< "${CUOPT_GRPC_ARGS}"
+        GRPC_CMD+=("${EXTRA[@]}")
+    fi
+
+    exec "${GRPC_CMD[@]}"
+fi
+
+exec "$@"
diff --git a/cpp/docs/DEVELOPER_GUIDE.md b/cpp/docs/DEVELOPER_GUIDE.md
index 716248b245..ba074b0e88 100644
--- a/cpp/docs/DEVELOPER_GUIDE.md
+++ b/cpp/docs/DEVELOPER_GUIDE.md
@@ -3,6 +3,7 @@
 This document serves as a guide for contributors to cuOpt C++ code. Developers should also refer
 to these additional files for further documentation of cuOpt best practices.
 
+* [gRPC server architecture](grpc-server-architecture.md) — full `cuopt_grpc_server` IPC, source file map, and streaming internals (end-user summary lives under `docs/cuopt/source/cuopt-grpc/`).
 * [Documentation Guide](TODO) for guidelines on documenting cuOpt code.
 * [Testing Guide](TODO) for guidelines on writing unit tests.
 * [Benchmarking Guide](TODO) for guidelines on writing unit benchmarks.
diff --git a/GRPC_SERVER_ARCHITECTURE.md b/cpp/docs/grpc-server-architecture.md
similarity index 65%
rename from GRPC_SERVER_ARCHITECTURE.md
rename to cpp/docs/grpc-server-architecture.md
index 2d6c2c324b..77fe093b28 100644
--- a/GRPC_SERVER_ARCHITECTURE.md
+++ b/cpp/docs/grpc-server-architecture.md
@@ -1,38 +1,45 @@
-# Server Architecture
+# NVIDIA cuOpt gRPC server architecture
 
-## Overview
+<!--
+  SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+  SPDX-License-Identifier: Apache-2.0
+-->
 
-The cuOpt gRPC server (`cuopt_grpc_server`) is a multi-process architecture designed for:
+> **Audience:** cuOpt contributors and advanced integrators debugging the server.
+>
+> End users should start with the cuOpt documentation **gRPC remote execution** section — Quick start, **Advanced configuration** (flags, TLS, Docker, client env vars), and the short **gRPC server behavior** overview (`docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md` in this repository). Those pages intentionally omit the C++-level detail below.
+
+The NVIDIA cuOpt gRPC server (`cuopt_grpc_server`) is a multi-process architecture designed for:
 - **Isolation**: Each solve runs in a separate worker process for fault tolerance
 - **Parallelism**: Multiple workers can process jobs concurrently
 - **Large Payloads**: Handles multi-GB problems and solutions
 - **Real-Time Feedback**: Log streaming and incumbent callbacks during solve
 
-For gRPC protocol and client API, see `GRPC_INTERFACE.md`. Server source files live under `cpp/src/grpc/server/`.
+Server source files live under `cpp/src/grpc/server/`.
 
 ## Process Model
 
 ```text
-┌────────────────────────────────────────────────────────────────────┐
+┌─────────────────────────────────────────────────────────────────────┐
 │                        Main Server Process                          │
-│                                                                      │
+│                                                                     │
 │  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────────────┐ │
 │  │  gRPC       │  │  Job         │  │  Background Threads         │ │
 │  │  Service    │  │  Tracker     │  │  - Result retrieval         │ │
 │  │  Handler    │  │  (job status,│  │  - Incumbent retrieval      │ │
 │  │             │  │   results)   │  │  - Worker monitor           │ │
 │  └─────────────┘  └──────────────┘  └─────────────────────────────┘ │
-│         │                                        ▲                   │
-│         │ shared memory                         │ pipes              │
-│         ▼                                        │                   │
+│         │                                        ▲                  │
+│         │ shared memory                          │ pipes            │
+│         ▼                                        │                  │
 │  ┌─────────────────────────────────────────────────────────────────┐│
-│  │                    Shared Memory Queues                          ││
+│  │                    Shared Memory Queues                         ││
 │  │   ┌─────────────────┐        ┌─────────────────────┐            ││
 │  │   │  Job Queue      │        │  Result Queue       │            ││
 │  │   │  (MAX_JOBS=100) │        │  (MAX_RESULTS=100)  │            ││
 │  │   └─────────────────┘        └─────────────────────┘            ││
 │  └─────────────────────────────────────────────────────────────────┘│
-└────────────────────────────────────────────────────────────────────┘
+└─────────────────────────────────────────────────────────────────────┘
          │                                        ▲
          │ fork()                                 │
          ▼                                        │
@@ -61,19 +68,19 @@ Each worker has dedicated pipes for data transfer:
 
 ```cpp
 struct WorkerPipes {
-  int to_worker_fd;               // Main → Worker: job data (server writes)
-  int from_worker_fd;             // Worker → Main: result data (server reads)
-  int worker_read_fd;             // Worker end of input pipe (worker reads)
-  int worker_write_fd;            // Worker end of output pipe (worker writes)
-  int incumbent_from_worker_fd;   // Worker → Main: incumbent solutions (server reads)
-  int worker_incumbent_write_fd;  // Worker end of incumbent pipe (worker writes)
+  int to_worker_fd;               // Main   -> Worker: server writes job data
+  int from_worker_fd;             // Worker -> Main: server reads result data
+  int worker_read_fd;             // Main   -> Worker: worker reads job data
+  int worker_write_fd;            // Worker -> Main: worker writes result data
+  int incumbent_from_worker_fd;   // Worker -> Main: server reads incumbent solutions
+  int worker_incumbent_write_fd;  // Worker -> Main: worker writes incumbent solutions
 };
 ```
 
 **Why pipes instead of shared memory for data?**
 - Pipes handle backpressure naturally (blocking writes)
 - No need to manage large shared memory segments
-- Works well with streaming uploads (data flows through)
+- Simpler lifecycle: data is consumed by the worker read and requires no explicit cleanup
 
 ### Source File Roles
 
@@ -81,23 +88,26 @@ All paths below are under `cpp/src/grpc/server/`.
 
 | File | Role |
 |------|------|
-| `grpc_server_main.cpp` | `main()`, `print_usage()`, argument parsing, shared-memory init, gRPC server run/stop. |
-| `grpc_service_impl.cpp` | `CuOptRemoteServiceImpl`: all 14 RPC handlers (SubmitJob, CheckStatus, GetResult, chunked upload/download, StreamLogs, GetIncumbents, CancelJob, DeleteResult, WaitForCompletion, Status probe). Uses mappers and job_management to enqueue jobs and trigger pipe I/O. |
+| `grpc_server_main.cpp` | `main()`, argument parsing (via argparse), shared-memory init, gRPC server run/stop. |
+| `grpc_service_impl.cpp` | `CuOptRemoteServiceImpl`: all 14 RPC handlers (SubmitJob, CheckStatus, GetResult, StartChunkedUpload, SendArrayChunk, FinishChunkedUpload, StartChunkedDownload, GetResultChunk, FinishChunkedDownload, StreamLogs, GetIncumbents, CancelJob, DeleteResult, WaitForCompletion). Uses mappers and job_management to enqueue jobs and trigger pipe I/O. |
 | `grpc_server_types.hpp` | Shared structs (e.g. `JobQueueEntry`, `ResultQueueEntry`, `ServerConfig`, `JobInfo`), enums, globals (atomics, mutexes, condition variables), and forward declarations used across server .cpp files. |
+| `grpc_server_logger.hpp` | Server operational logger declaration (`server_logger()`, `init_server_logger()`) and `SERVER_LOG_*` convenience macros built on `rapids_logger`. Separate from the solver logger. |
+| `grpc_server_logger.cpp` | Server logger implementation: constructs a `rapids_logger::logger` with configurable console/file sinks and verbose/quiet levels. Created before `fork()` so both main and worker processes share the same output. |
 | `grpc_field_element_size.hpp` | Maps `cuopt::remote::ArrayFieldId` to element byte size; used by pipe deserialization and chunked logic. |
-| `grpc_pipe_serialization.hpp` | Streaming pipe I/O: write/read individual length-prefixed protobuf messages (ChunkedProblemHeader, ChunkedResultHeader, ArrayChunk) directly to/from pipe fds. Avoids large intermediate buffers. Also serializes SubmitJobRequest for unary pipe transfer. |
+| `grpc_pipe_io.cpp` | Low-level pipe I/O primitives: `write_to_pipe()` (blocking retry loop) and `read_from_pipe()` (poll-based timeout + blocking read). Used by all higher-level pipe functions. |
+| `grpc_pipe_serialization.hpp` | Protobuf-level pipe serialization: write/read length-prefixed protobuf messages and raw arrays to/from pipe fds. Also serializes `SubmitJobRequest` for unary pipe transfer. Defines `kPipeBufferSize` and `kMaxProtobufMessageBytes`. |
 | `grpc_incumbent_proto.hpp` | Build `Incumbent` proto from (job_id, objective, assignment) and parse it back; used by worker when pushing incumbents and by main when reading from the incumbent pipe. |
-| `grpc_worker.cpp` | `worker_process(worker_index)`: loop over job queue, receive job data via pipe (unary or chunked), call solver, send result (and optionally incumbents) back. Contains `IncumbentPipeCallback` and `store_simple_result`. |
-| `grpc_worker_infra.cpp` | Pipe creation/teardown, `spawn_worker` / `spawn_workers`, `wait_for_workers`, `mark_worker_jobs_failed`, `cleanup_shared_memory`. |
-| `grpc_server_threads.cpp` | `worker_monitor_thread`, `result_retrieval_thread`, `incumbent_retrieval_thread`, `session_reaper_thread`. |
-| `grpc_job_management.cpp` | Low-level pipe read/write, `send_job_data_pipe` / `recv_job_data_pipe`, `submit_job_async`, `check_job_status`, `cancel_job`, `generate_job_id`, log-dir helpers. |
+| `grpc_worker.cpp` | `worker_process(worker_index)`: loop over job queue, receive job data via pipe (unary or chunked), call solver, send result (and optionally incumbents) back. Contains `IncumbentPipeCallback`, `store_simple_result`, and `publish_result`. |
+| `grpc_worker_infra.cpp` | Pipe creation/teardown, `spawn_worker` / `spawn_workers` / `spawn_single_worker`, `wait_for_workers`, `mark_worker_jobs_failed`, `cleanup_shared_memory`. |
+| `grpc_server_threads.cpp` | `worker_monitor_thread`, `result_retrieval_thread` (also dispatches job data to workers), `incumbent_retrieval_thread`, `session_reaper_thread`. |
+| `grpc_job_management.cpp` | Pipe-level send/recv (`send_job_data_pipe`, `recv_job_data_pipe`, `send_incumbent_pipe`, `recv_incumbent_pipe`), `submit_job_async`, `submit_chunked_job_async`, `check_job_status`, `cancel_job`, `generate_job_id`, log-dir helpers. |
 
 ### Large Payload Handling
 
 For large problems uploaded via chunked gRPC RPCs:
 
 1. Server holds chunked upload state in memory (`ChunkedUploadState`: header + array chunks per `upload_id`).
-2. When `FinishChunkedUpload` is called, the header and chunks are stored in `pending_chunked_data`. The data dispatch thread streams them directly to the worker pipe as individual length-prefixed protobuf messages — no intermediate blob is created.
+2. When `FinishChunkedUpload` is called, the header and chunks are stored in `pending_chunked_data`. The result retrieval thread (which also handles job dispatch) streams them directly to the worker pipe as individual length-prefixed protobuf messages — no intermediate blob is created.
 3. Worker reads the streamed messages from the pipe, reassembles arrays, runs the solver, and writes the result (and optionally incumbents) back via pipes using the same streaming format.
 4. Main process result-retrieval thread reads the streamed result messages from the pipe and stores the result for `GetResult` or chunked download.
 
@@ -112,11 +122,11 @@ No disk spooling: chunked data is kept in memory in the main process until forwa
 ```text
 Client                     Server                      Worker
    │                          │                           │
-   │─── SubmitJob ──────────►│                           │
+   │─── SubmitJob ──────────► │                           │
    │                          │ Create job entry          │
    │                          │ Store problem data        │
    │                          │ job_queue[slot].ready=true│
-   │◄── job_id ──────────────│                           │
+   │◄── job_id ────────────── │                           │
 ```
 
 ### 2. Processing
@@ -126,14 +136,14 @@ Client                     Server                      Worker
    │                          │                           │
    │                          │                           │ Poll job_queue
    │                          │                           │ Claim job (CAS)
-   │                          │◄─────────────────────────│ Read problem via pipe
+   │                          │ ── job data via pipe ───> │
    │                          │                           │
    │                          │                           │ Convert CPU→GPU
    │                          │                           │ solve_lp/solve_mip
    │                          │                           │ Convert GPU→CPU
    │                          │                           │
-   │                          │ result_queue[slot].ready │◄──────────────────
-   │                          │◄── result data via pipe ─│
+   │                          │ result_queue[slot].ready  │ (worker sets flag)
+   │                          │ <── result data via pipe ─│
 ```
 
 ### 3. Result Retrieval
@@ -173,8 +183,16 @@ Client                     Worker
 
 ### Result Retrieval Thread
 
+This thread handles both job dispatch and result retrieval:
+
+**Job dispatch** (first scan):
+- Scans `job_queue` for claimed jobs with `data_sent == false`
+- Sends job data to the worker's pipe (unary or chunked)
+- Marks `data_sent = true` on success
+
+**Result retrieval** (second scan):
 - Monitors `result_queue` for completed jobs
-- Reads result data from worker pipes
+- Reads streamed result data from worker pipes
 - Updates `job_tracker` with results
 - Notifies waiting clients (via condition variable)
 
@@ -229,13 +247,15 @@ The `StreamLogs` RPC:
 ```bash
 cuopt_grpc_server [options]
 
-  -p, --port PORT              gRPC listen port (default: 8765)
+  -p, --port PORT              gRPC listen port (default: 5001)
   -w, --workers NUM            Number of worker processes (default: 1)
       --max-message-mb N       Max gRPC message size in MiB (default: 256; clamped to [4 KiB, ~2 GiB])
       --max-message-bytes N    Max gRPC message size in bytes (exact; min 4096)
-      --enable-transfer-hash   Log data hashes for streaming transfers (for testing)
+      --chunk-timeout N        Per-chunk timeout in seconds for streaming (0=disabled, default: 60)
       --log-to-console         Echo solver logs to server console
+  -v, --verbose                Increase verbosity (default: on)
   -q, --quiet                  Reduce verbosity (verbose is the default)
+      --server-log PATH        Path to server operational log file (in addition to console)
 
 TLS Options:
       --tls                    Enable TLS encryption
@@ -245,6 +265,20 @@ TLS Options:
       --require-client-cert    Require client certificate (mTLS)
 ```
 
+### NVIDIA cuOpt container image
+
+When you use the official NVIDIA cuOpt container **without** an explicit command, the entrypoint chooses between the Python REST server and `cuopt_grpc_server`. User-facing Docker and client configuration is documented in `docs/cuopt/source/cuopt-grpc/advanced.rst` in this repository (the published **Advanced configuration** page).
+
+When **`CUOPT_SERVER_TYPE=grpc`**, the entrypoint maps:
+
+| Variable | Role |
+|----------|------|
+| `CUOPT_SERVER_PORT` | Passed as `--port` (default `5001`). |
+| `CUOPT_GPU_COUNT` | When set, passed as `--workers`. When unset, `--workers` is omitted and the server uses its default worker count. |
+| `CUOPT_GRPC_ARGS` | Optional whitespace-separated **extra** `cuopt_grpc_server` flags (TLS, message limits, logging, and so on). Each token becomes one argv word; embedded spaces inside a single flag value are not supported through this variable—invoke `cuopt_grpc_server` directly if you need complex quoting. |
+
+Any flag listed in *Configuration options* above can be supplied on the host CLI or inside `CUOPT_GRPC_ARGS`.
+
 ## Fault Tolerance
 
 ### Worker Crashes
@@ -267,9 +301,8 @@ On SIGINT/SIGTERM:
 
 When `CancelJob` is called:
 1. Set `job_queue[slot].cancelled = true`
-2. Worker checks the flag before starting the solve
-3. If cancelled, worker stores CANCELLED result and skips to the next job
-4. If the solve has already started, it runs to completion (no mid-solve cancellation)
+2. If the job is **queued** (no worker yet): the worker checks the flag before starting and skips to the next job
+3. If the job is **running** (worker has claimed it): the worker process is killed with `SIGKILL`, the worker-monitor thread detects the exit and posts a `RESULT_CANCELLED` status, and a replacement worker is spawned automatically
 
 ## Memory Management
 
@@ -289,7 +322,7 @@ When `CancelJob` is called:
 - Each worker needs a GPU (or shares with others)
 - Too many workers: GPU memory contention
 - Too few workers: Underutilized when jobs queue
-- Recommendation: 1-2 workers per GPU
+- Recommendation: 1 worker per GPU. Higher values are possible depending on the problems being solved but there is no specific guidance at this time
 
 ### Pipe Buffering
 
@@ -306,11 +339,22 @@ When `CancelJob` is called:
 
 ## File Locations
 
+### POSIX Shared Memory
+
+These names are passed to `shm_open()` and live under `/dev/shm/` (a kernel tmpfs), not on the regular filesystem. Writable on virtually all Linux systems and standard container runtimes.
+
+| Name | Purpose |
+|------|---------|
+| `/cuopt_job_queue` | Job metadata (slots, flags, job IDs) |
+| `/cuopt_result_queue` | Result metadata (status, error messages) |
+| `/cuopt_control` | Server control (shutdown flag, worker count) |
+
+### Filesystem
+
 | Path | Purpose |
 |------|---------|
 | `/tmp/cuopt_logs/` | Per-job solver log files |
-| `/cuopt_job_queue` | Shared memory (job metadata) |
-| `/cuopt_result_queue` | Shared memory (result metadata) |
-| `/cuopt_control` | Shared memory (server control) |
+
+The log directory is hardcoded. `ensure_log_dir_exists()` calls `mkdir()` but does not check the return value — if the process lacks write permission on `/tmp`, log file creation will silently fail.
 
 Chunked upload state is held in memory in the main process (no upload directory).
diff --git a/cpp/src/grpc/client/grpc_client.cpp b/cpp/src/grpc/client/grpc_client.cpp
index 49839cd9b3..59c6bfcb5d 100644
--- a/cpp/src/grpc/client/grpc_client.cpp
+++ b/cpp/src/grpc/client/grpc_client.cpp
@@ -1034,7 +1034,7 @@ bool grpc_client_t::download_chunked_result(const std::string& job_id,
   GRPC_CLIENT_DEBUG_LOG(config_,
                         "[grpc_client] ChunkedDownload started, download_id="
                           << download_id << " arrays=" << header->arrays_size()
-                          << " is_mip=" << header->is_mip());
+                          << " problem_category=" << header->problem_category());
 
   // --- 2. Fetch each array via GetResultChunk RPCs ---
   int64_t chunk_data_budget = config_.chunk_size_bytes;
diff --git a/cpp/src/grpc/client/grpc_client.hpp b/cpp/src/grpc/client/grpc_client.hpp
index f8579b3271..4c3aa0c3d3 100644
--- a/cpp/src/grpc/client/grpc_client.hpp
+++ b/cpp/src/grpc/client/grpc_client.hpp
@@ -10,6 +10,8 @@
 #include <cuopt/linear_programming/optimization_problem_interface.hpp>
 #include <cuopt/linear_programming/pdlp/solver_settings.hpp>
 
+#include "../cuopt_default_grpc_port.h"
+
 #include <atomic>
 #include <cstdint>
 #include <functional>
@@ -52,7 +54,7 @@ void grpc_test_mark_as_connected(class grpc_client_t& client);
  * - Result retrieval uses chunked download for results exceeding max_message_bytes.
  */
 struct grpc_client_config_t {
-  std::string server_address = "localhost:8765";
+  std::string server_address = std::string("localhost:") + std::to_string(cuopt_default_grpc_port);
   int poll_interval_ms       = 1000;   // How often to poll for job status
   int timeout_seconds        = 0;      // Max time to wait for job completion (0 = no limit)
   bool stream_logs           = false;  // Whether to stream logs from server
@@ -93,10 +95,6 @@ struct grpc_client_config_t {
   // Controlled by CUOPT_GRPC_DEBUG env var (0|1). Default: off.
   bool enable_debug_log = false;
 
-  // Log FNV-1a hashes of uploaded/downloaded data on both client and server.
-  // Comparing the two hashes confirms data was not corrupted in transit.
-  bool enable_transfer_hash = false;
-
   // Override for the chunked upload threshold (bytes). Normally computed
   // automatically as 75% of max_message_bytes.  Set to 0 to force chunked
   // upload for all problems, or a positive value to override.  -1 = auto.
@@ -204,7 +202,7 @@ struct remote_mip_result_t {
  *
  * Usage:
  * @code
- * grpc_client_t client("localhost:8765");
+ * grpc_client_t client;  // default server: localhost:<cuopt_default_grpc_port>
  * if (!client.connect()) { ... handle error ... }
  *
  * auto result = client.solve_lp(problem, settings);
diff --git a/cpp/src/grpc/cuopt_default_grpc_port.h b/cpp/src/grpc/cuopt_default_grpc_port.h
new file mode 100644
index 0000000000..7b2783ca64
--- /dev/null
+++ b/cpp/src/grpc/cuopt_default_grpc_port.h
@@ -0,0 +1,12 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights
+ * reserved. SPDX-License-Identifier: Apache-2.0
+ */
+
+#ifndef CUOPT_DEFAULT_GRPC_PORT_H
+#define CUOPT_DEFAULT_GRPC_PORT_H
+
+/** Default TCP listen port for cuopt_grpc_server and matching client defaults. */
+#define cuopt_default_grpc_port 5001
+
+#endif  // CUOPT_DEFAULT_GRPC_PORT_H
diff --git a/cpp/src/grpc/cuopt_remote.proto b/cpp/src/grpc/cuopt_remote.proto
index cc7af2a1f7..d58145a8e6 100644
--- a/cpp/src/grpc/cuopt_remote.proto
+++ b/cpp/src/grpc/cuopt_remote.proto
@@ -16,6 +16,11 @@ enum ProblemCategory {
   MIP = 1;
 }
 
+enum VariableType {
+  CONTINUOUS = 0;
+  INTEGER = 1;
+}
+
 // Optimization problem representation (field names match cpu_optimization_problem_t)
 message OptimizationProblem {
   // Problem metadata
@@ -26,40 +31,39 @@ message OptimizationProblem {
   double objective_offset = 5;
 
   // Variable and row names (optional)
-  repeated string variable_names = 6;
-  repeated string row_names = 7;
+  repeated string variable_names = 7;
+  repeated string row_names = 8;
 
   // Constraint matrix A in CSR format
-  repeated double A = 8;
-  repeated int32 A_indices = 9;
-  repeated int32 A_offsets = 10;
+  repeated double A_values = 9;
+  repeated int32 A_indices = 10;
+  repeated int32 A_offsets = 11;
 
   // Problem vectors
-  repeated double c = 11;              // objective coefficients
-  repeated double b = 12;              // constraint bounds (RHS)
-  repeated double variable_lower_bounds = 13;
-  repeated double variable_upper_bounds = 14;
+  repeated double c = 12;              // objective coefficients
+  repeated double b = 13;              // constraint bounds (RHS)
+  repeated double variable_lower_bounds = 14;
+  repeated double variable_upper_bounds = 15;
 
   // Constraint bounds (alternative to b + row_types)
-  repeated double constraint_lower_bounds = 15;
-  repeated double constraint_upper_bounds = 16;
-  bytes row_types = 17;  // char array: 'E' (=), 'L' (<=), 'G' (>=), 'N' (objective)
+  repeated double constraint_lower_bounds = 16;
+  repeated double constraint_upper_bounds = 17;
+  bytes row_types = 18;  // char array: 'E' (=), 'L' (<=), 'G' (>=), 'N' (objective)
 
-  // Variable types
-  bytes variable_types = 18;  // char array: 'C' (continuous), 'I' (integer), 'B' (binary)
+  // Variable types (enum-based: CONTINUOUS or INTEGER)
+  repeated VariableType variable_types = 19;
 
   // Initial solutions
-  repeated double initial_primal_solution = 19;
-  repeated double initial_dual_solution = 20;
+  repeated double initial_primal_solution = 20;
+  repeated double initial_dual_solution = 21;
 
   // Quadratic objective matrix Q in CSR format
-  repeated double Q_values = 21;
-  repeated int32 Q_indices = 22;
-  repeated int32 Q_offsets = 23;
+  repeated double Q_values = 22;
+  repeated int32 Q_indices = 23;
+  repeated int32 Q_offsets = 24;
 }
 
 // PDLP solver mode enum (matches cuOpt pdlp_solver_mode_t)
-// Matches cuOpt pdlp_solver_mode_t enum values
 enum PDLPSolverMode {
   Stable1 = 0;
   Stable2 = 1;
@@ -77,6 +81,7 @@ enum LPMethod {
 }
 
 // PDLP solver settings (field names match cuOpt Python/C++ API)
+// Dense numbering: tolerances 1-8, limits 9-10, config 11-30
 message PDLPSolverSettings {
   // Termination tolerances
   double absolute_gap_tolerance = 1;
@@ -89,39 +94,42 @@ message PDLPSolverSettings {
   double relative_primal_tolerance = 8;
 
   // Limits
-  double time_limit = 10;
+  double time_limit = 9;
   // Iteration limit. Sentinel: set to -1 to mean "unset/use server defaults".
   // Note: proto3 numeric fields default to 0 when omitted, so clients should
   // explicitly use -1 (or a positive value) to avoid accidentally requesting 0 iterations.
-  int64 iteration_limit = 11;
+  int64 iteration_limit = 10;
 
   // Solver configuration
-  bool log_to_console = 20;
-  bool detect_infeasibility = 21;
-  bool strict_infeasibility = 22;
-  PDLPSolverMode pdlp_solver_mode = 23;
-  LPMethod method = 24;
-  int32 presolver = 25;
-  bool dual_postsolve = 26;
-  bool crossover = 27;
-  int32 num_gpus = 28;
-
-  bool per_constraint_residual = 30;
-  bool cudss_deterministic = 31;
-  int32 folding = 32;
-  int32 augmented = 33;
-  int32 dualize = 34;
-  int32 ordering = 35;
-  int32 barrier_dual_initial_point = 36;
-  bool eliminate_dense_columns = 37;
-  bool save_best_primal_so_far = 38;
-  bool first_primal_feasible = 39;
-  int32 pdlp_precision = 40;
+  bool log_to_console = 11;
+  bool detect_infeasibility = 12;
+  bool strict_infeasibility = 13;
+  PDLPSolverMode pdlp_solver_mode = 14;
+  LPMethod method = 15;
+  int32 presolver = 16;
+  bool dual_postsolve = 17;
+  bool crossover = 18;
+  int32 num_gpus = 19;
+
+  bool per_constraint_residual = 20;
+  bool cudss_deterministic = 21;
+  int32 folding = 22;
+  int32 augmented = 23;
+  int32 dualize = 24;
+  int32 ordering = 25;
+  int32 barrier_dual_initial_point = 26;
+  bool eliminate_dense_columns = 27;
+  bool save_best_primal_so_far = 28;
+  bool first_primal_feasible = 29;
+  int32 pdlp_precision = 30;
 
   // Warm start data (if provided)
   PDLPWarmStartData warm_start_data = 50;
 }
 
+// PDLP warm start data for continuing from a previous solve.
+// Array field numbers 1-9, scalar field numbers in 3000-range
+// (shared with ChunkedResultHeader for consistent chunked transfer numbering).
 message PDLPWarmStartData {
   repeated double current_primal_solution = 1;
   repeated double current_dual_solution = 2;
@@ -133,17 +141,18 @@ message PDLPWarmStartData {
   repeated double last_restart_duality_gap_primal_solution = 8;
   repeated double last_restart_duality_gap_dual_solution = 9;
 
-  double initial_primal_weight = 10;
-  double initial_step_size = 11;
-  int32 total_pdlp_iterations = 12;
-  int32 total_pdhg_iterations = 13;
-  double last_candidate_kkt_score = 14;
-  double last_restart_kkt_score = 15;
-  double sum_solution_weight = 16;
-  int32 iterations_since_last_restart = 17;
+  double initial_primal_weight = 3000;
+  double initial_step_size = 3001;
+  int32 total_pdlp_iterations = 3002;
+  int32 total_pdhg_iterations = 3003;
+  double last_candidate_kkt_score = 3004;
+  double last_restart_kkt_score = 3005;
+  double sum_solution_weight = 3006;
+  int32 iterations_since_last_restart = 3007;
 }
 
 // MIP solver settings (field names match cuOpt Python/C++ API)
+// Dense numbering 1-28
 message MIPSolverSettings {
   // Limits
   double time_limit = 1;
@@ -157,12 +166,35 @@ message MIPSolverSettings {
   double presolve_absolute_tolerance = 7;
 
   // Solver configuration
-  bool log_to_console = 10;
-  bool heuristics_only = 11;
-  int32 num_cpu_threads = 12;
-  int32 num_gpus = 13;
-  int32 presolver = 14;
-  int32 mip_scaling = 15;
+  bool log_to_console = 8;
+  bool heuristics_only = 9;
+  int32 num_cpu_threads = 10;
+  int32 num_gpus = 11;
+  int32 presolver = 12;
+  int32 mip_scaling = 13;
+
+  // Additional limits
+  double work_limit = 14;
+  int32 node_limit = 15;
+
+  // Branching
+  int32 reliability_branching = 16;
+  int32 mip_batch_pdlp_strong_branching = 17;
+
+  // Cut configuration
+  int32 max_cut_passes = 18;
+  int32 mir_cuts = 19;
+  int32 mixed_integer_gomory_cuts = 20;
+  int32 knapsack_cuts = 21;
+  int32 clique_cuts = 22;
+  int32 strong_chvatal_gomory_cuts = 23;
+  int32 reduced_cost_strengthening = 24;
+  double cut_change_threshold = 25;
+  double cut_min_orthogonality = 26;
+
+  // Determinism and reproducibility
+  int32 determinism_mode = 27;
+  int32 seed = 28;
 }
 
 // LP solve request
@@ -181,6 +213,8 @@ message SolveMIPRequest {
 }
 
 // LP solution
+// Array field numbers 1-4, scalar field numbers in 1000-range
+// (shared with ChunkedResultHeader for consistent chunked transfer numbering).
 message LPSolution {
   // Solution vectors
   repeated double primal_solution = 1;
@@ -191,18 +225,18 @@ message LPSolution {
   PDLPWarmStartData warm_start_data = 4;
 
   // Termination information
-  PDLPTerminationStatus termination_status = 10;
-  string error_message = 11;
+  PDLPTerminationStatus lp_termination_status = 1000;
+  string error_message = 1001;
 
   // Solution statistics
-  double l2_primal_residual = 20;
-  double l2_dual_residual = 21;
-  double primal_objective = 22;
-  double dual_objective = 23;
-  double gap = 24;
-  int32 nb_iterations = 25;
-  double solve_time = 26;
-  int32 solved_by = 27;
+  double l2_primal_residual = 1002;
+  double l2_dual_residual = 1003;
+  double primal_objective = 1004;
+  double dual_objective = 1005;
+  double gap = 1006;
+  int32 nb_iterations = 1007;
+  double solve_time = 1008;
+  int32 solved_by = 1009;
 }
 
 enum PDLPTerminationStatus {
@@ -218,22 +252,24 @@ enum PDLPTerminationStatus {
 }
 
 // MIP solution
+// Array field number 1, scalar field numbers in 2000-range
+// (shared with ChunkedResultHeader for consistent chunked transfer numbering).
 message MIPSolution {
-  repeated double solution = 1;
-
-  MIPTerminationStatus termination_status = 10;
-  string error_message = 11;
-
-  double objective = 20;
-  double mip_gap = 21;
-  double solution_bound = 22;
-  double total_solve_time = 23;
-  double presolve_time = 24;
-  double max_constraint_violation = 25;
-  double max_int_violation = 26;
-  double max_variable_bound_violation = 27;
-  int32 nodes = 28;
-  int32 simplex_iterations = 29;
+  repeated double mip_solution = 1;
+
+  MIPTerminationStatus mip_termination_status = 2000;
+  string mip_error_message = 2001;
+
+  double mip_objective = 2002;
+  double mip_gap = 2003;
+  double solution_bound = 2004;
+  double total_solve_time = 2005;
+  double presolve_time = 2006;
+  double max_constraint_violation = 2007;
+  double max_int_violation = 2008;
+  double max_variable_bound_violation = 2009;
+  int32 nodes = 2010;
+  int32 simplex_iterations = 2011;
 }
 
 enum MIPTerminationStatus {
@@ -246,47 +282,46 @@ enum MIPTerminationStatus {
   MIP_WORK_LIMIT = 6;
 }
 
-// Array field identifiers for chunked array transfers
-// Used to identify which problem array a chunk belongs to
+// Array field identifiers for chunked array transfers.
+// Numbering matches codegen field_registry.yaml array_id values.
 enum ArrayFieldId {
-  FIELD_A_VALUES = 0;
-  FIELD_A_INDICES = 1;
-  FIELD_A_OFFSETS = 2;
-  FIELD_C = 3;
-  FIELD_B = 4;
-  FIELD_VARIABLE_LOWER_BOUNDS = 5;
-  FIELD_VARIABLE_UPPER_BOUNDS = 6;
-  FIELD_CONSTRAINT_LOWER_BOUNDS = 7;
-  FIELD_CONSTRAINT_UPPER_BOUNDS = 8;
-  FIELD_ROW_TYPES = 9;
-  FIELD_VARIABLE_TYPES = 10;
-  FIELD_Q_VALUES = 11;
-  FIELD_Q_INDICES = 12;
-  FIELD_Q_OFFSETS = 13;
-  FIELD_INITIAL_PRIMAL = 14;
-  FIELD_INITIAL_DUAL = 15;
-  // String arrays (null-separated bytes, sent as chunks alongside numeric data)
-  FIELD_VARIABLE_NAMES = 20;
-  FIELD_ROW_NAMES = 21;
+  FIELD_VARIABLE_NAMES = 0;
+  FIELD_ROW_NAMES = 1;
+  FIELD_A_VALUES = 2;
+  FIELD_A_INDICES = 3;
+  FIELD_A_OFFSETS = 4;
+  FIELD_C = 5;
+  FIELD_B = 6;
+  FIELD_VARIABLE_LOWER_BOUNDS = 7;
+  FIELD_VARIABLE_UPPER_BOUNDS = 8;
+  FIELD_CONSTRAINT_LOWER_BOUNDS = 9;
+  FIELD_CONSTRAINT_UPPER_BOUNDS = 10;
+  FIELD_ROW_TYPES = 11;
+  FIELD_VARIABLE_TYPES = 12;
+  FIELD_INITIAL_PRIMAL_SOLUTION = 13;
+  FIELD_INITIAL_DUAL_SOLUTION = 14;
+  FIELD_Q_VALUES = 15;
+  FIELD_Q_INDICES = 16;
+  FIELD_Q_OFFSETS = 17;
 }
 
-// Result array field identifiers for chunked result downloads
-// Used to identify which result array a chunk belongs to
+// Result array field identifiers for chunked result downloads.
+// Numbering matches codegen field_registry.yaml array_id values.
 enum ResultFieldId {
   RESULT_PRIMAL_SOLUTION = 0;
   RESULT_DUAL_SOLUTION = 1;
   RESULT_REDUCED_COST = 2;
-  RESULT_MIP_SOLUTION = 3;
   // Warm start arrays (LP only)
-  RESULT_WS_CURRENT_PRIMAL = 10;
-  RESULT_WS_CURRENT_DUAL = 11;
-  RESULT_WS_INITIAL_PRIMAL_AVG = 12;
-  RESULT_WS_INITIAL_DUAL_AVG = 13;
-  RESULT_WS_CURRENT_ATY = 14;
-  RESULT_WS_SUM_PRIMAL = 15;
-  RESULT_WS_SUM_DUAL = 16;
-  RESULT_WS_LAST_RESTART_GAP_PRIMAL = 17;
-  RESULT_WS_LAST_RESTART_GAP_DUAL = 18;
+  RESULT_WS_CURRENT_PRIMAL = 3;
+  RESULT_WS_CURRENT_DUAL = 4;
+  RESULT_WS_INITIAL_PRIMAL_AVG = 5;
+  RESULT_WS_INITIAL_DUAL_AVG = 6;
+  RESULT_WS_CURRENT_ATY = 7;
+  RESULT_WS_SUM_PRIMAL = 8;
+  RESULT_WS_SUM_DUAL = 9;
+  RESULT_WS_LAST_RESTART_GAP_PRIMAL = 10;
+  RESULT_WS_LAST_RESTART_GAP_DUAL = 11;
+  RESULT_MIP_SOLUTION = 12;
 }
 
 // Job status for async operations
diff --git a/cpp/src/grpc/cuopt_remote_service.proto b/cpp/src/grpc/cuopt_remote_service.proto
index 24c7517781..16fb1a8d80 100644
--- a/cpp/src/grpc/cuopt_remote_service.proto
+++ b/cpp/src/grpc/cuopt_remote_service.proto
@@ -192,47 +192,49 @@ message ResultArrayDescriptor {
 
 // Header for chunked result download - carries all scalar/enum/string fields
 // from LPSolution or MIPSolution. Array data is sent via GetResultChunk.
+// Field numbers use the same 1000/2000/3000 ranges as LPSolution, MIPSolution,
+// and PDLPWarmStartData for consistency with the codegen numbering scheme.
 message ChunkedResultHeader {
-  bool is_mip = 1;
-
-  // LP result scalars
-  PDLPTerminationStatus lp_termination_status = 10;
-  string error_message = 11;
-  double l2_primal_residual = 12;
-  double l2_dual_residual = 13;
-  double primal_objective = 14;
-  double dual_objective = 15;
-  double gap = 16;
-  int32 nb_iterations = 17;
-  double solve_time = 18;
-  int32 solved_by = 19;
-
-  // MIP result scalars
-  MIPTerminationStatus mip_termination_status = 30;
-  string mip_error_message = 31;
-  double mip_objective = 32;
-  double mip_gap = 33;
-  double solution_bound = 34;
-  double total_solve_time = 35;
-  double presolve_time = 36;
-  double max_constraint_violation = 37;
-  double max_int_violation = 38;
-  double max_variable_bound_violation = 39;
-  int32 nodes = 40;
-  int32 simplex_iterations = 41;
-
-  // LP warm start scalars (included in header since they are small)
-  double ws_initial_primal_weight = 60;
-  double ws_initial_step_size = 61;
-  int32 ws_total_pdlp_iterations = 62;
-  int32 ws_total_pdhg_iterations = 63;
-  double ws_last_candidate_kkt_score = 64;
-  double ws_last_restart_kkt_score = 65;
-  double ws_sum_solution_weight = 66;
-  int32 ws_iterations_since_last_restart = 67;
+  ProblemCategory problem_category = 1;
 
   // Array metadata so client knows what to fetch
   repeated ResultArrayDescriptor arrays = 50;
+
+  // LP result scalars (1000-range, same as LPSolution)
+  PDLPTerminationStatus lp_termination_status = 1000;
+  string error_message = 1001;
+  double l2_primal_residual = 1002;
+  double l2_dual_residual = 1003;
+  double primal_objective = 1004;
+  double dual_objective = 1005;
+  double gap = 1006;
+  int32 nb_iterations = 1007;
+  double solve_time = 1008;
+  int32 solved_by = 1009;
+
+  // MIP result scalars (2000-range, same as MIPSolution)
+  MIPTerminationStatus mip_termination_status = 2000;
+  string mip_error_message = 2001;
+  double mip_objective = 2002;
+  double mip_gap = 2003;
+  double solution_bound = 2004;
+  double total_solve_time = 2005;
+  double presolve_time = 2006;
+  double max_constraint_violation = 2007;
+  double max_int_violation = 2008;
+  double max_variable_bound_violation = 2009;
+  int32 nodes = 2010;
+  int32 simplex_iterations = 2011;
+
+  // LP warm start scalars (3000-range, same as PDLPWarmStartData)
+  double ws_initial_primal_weight = 3000;
+  double ws_initial_step_size = 3001;
+  int32 ws_total_pdlp_iterations = 3002;
+  int32 ws_total_pdhg_iterations = 3003;
+  double ws_last_candidate_kkt_score = 3004;
+  double ws_last_restart_kkt_score = 3005;
+  double ws_sum_solution_weight = 3006;
+  int32 ws_iterations_since_last_restart = 3007;
 }
 
 message StartChunkedDownloadRequest {
diff --git a/cpp/src/grpc/grpc_problem_mapper.cpp b/cpp/src/grpc/grpc_problem_mapper.cpp
index 7a7bdde642..bc5342defe 100644
--- a/cpp/src/grpc/grpc_problem_mapper.cpp
+++ b/cpp/src/grpc/grpc_problem_mapper.cpp
@@ -40,7 +40,7 @@ void map_problem_to_proto(const cpu_optimization_problem_t<i_t, f_t>& cpu_proble
 
   // Constraint matrix A in CSR format
   for (const auto& val : values) {
-    pb_problem->add_a(static_cast<double>(val));
+    pb_problem->add_a_values(static_cast<double>(val));
   }
   for (const auto& idx : indices) {
     pb_problem->add_a_indices(static_cast<int32_t>(idx));
@@ -107,19 +107,15 @@ void map_problem_to_proto(const cpu_optimization_problem_t<i_t, f_t>& cpu_proble
   // Variable types (for MIP problems)
   auto var_types = cpu_problem.get_variable_types_host();
   if (!var_types.empty()) {
-    // Convert var_t enum to char representation
-    std::string var_types_str;
-    var_types_str.reserve(var_types.size());
     for (const auto& vt : var_types) {
       switch (vt) {
-        case var_t::CONTINUOUS: var_types_str.push_back('C'); break;
-        case var_t::INTEGER: var_types_str.push_back('I'); break;
+        case var_t::CONTINUOUS: pb_problem->add_variable_types(cuopt::remote::CONTINUOUS); break;
+        case var_t::INTEGER: pb_problem->add_variable_types(cuopt::remote::INTEGER); break;
         default:
           throw std::runtime_error("map_problem_to_proto: unknown var_t value " +
                                    std::to_string(static_cast<int>(vt)));
       }
     }
-    pb_problem->set_variable_types(var_types_str);
   }
 
   // Quadratic objective matrix Q (for QPS problems)
@@ -152,7 +148,7 @@ void map_proto_to_problem(const cuopt::remote::OptimizationProblem& pb_problem,
   cpu_problem.set_objective_offset(pb_problem.objective_offset());
 
   // Constraint matrix A in CSR format
-  std::vector<f_t> values(pb_problem.a().begin(), pb_problem.a().end());
+  std::vector<f_t> values(pb_problem.a_values().begin(), pb_problem.a_values().end());
   std::vector<i_t> indices(pb_problem.a_indices().begin(), pb_problem.a_indices().end());
   std::vector<i_t> offsets(pb_problem.a_offsets().begin(), pb_problem.a_offsets().end());
 
@@ -211,19 +207,16 @@ void map_proto_to_problem(const cuopt::remote::OptimizationProblem& pb_problem,
   }
 
   // Variable types
-  if (!pb_problem.variable_types().empty()) {
-    const std::string& var_types_str = pb_problem.variable_types();
-    // Convert char representation to var_t enum
+  if (pb_problem.variable_types_size() > 0) {
     std::vector<var_t> var_types;
-    var_types.reserve(var_types_str.size());
-    for (char c : var_types_str) {
-      switch (c) {
-        case 'C': var_types.push_back(var_t::CONTINUOUS); break;
-        case 'I':
-        case 'B': var_types.push_back(var_t::INTEGER); break;
+    var_types.reserve(pb_problem.variable_types_size());
+    for (int i = 0; i < pb_problem.variable_types_size(); ++i) {
+      switch (pb_problem.variable_types(i)) {
+        case cuopt::remote::CONTINUOUS: var_types.push_back(var_t::CONTINUOUS); break;
+        case cuopt::remote::INTEGER: var_types.push_back(var_t::INTEGER); break;
         default:
-          throw std::runtime_error(std::string("Unknown variable type character '") + c +
-                                   "' in variable_types string (expected 'C', 'I', or 'B')");
+          throw std::runtime_error("Unknown VariableType enum value " +
+                                   std::to_string(pb_problem.variable_types(i)));
       }
     }
     cpu_problem.set_variable_types(var_types.data(), static_cast<i_t>(var_types.size()));
@@ -244,11 +237,10 @@ void map_proto_to_problem(const cuopt::remote::OptimizationProblem& pb_problem,
   }
 
   // Infer problem category from variable types
-  if (!pb_problem.variable_types().empty()) {
-    const std::string& var_types_str = pb_problem.variable_types();
-    bool has_integers                = false;
-    for (char c : var_types_str) {
-      if (c == 'I' || c == 'B') {
+  if (pb_problem.variable_types_size() > 0) {
+    bool has_integers = false;
+    for (int i = 0; i < pb_problem.variable_types_size(); ++i) {
+      if (pb_problem.variable_types(i) == cuopt::remote::INTEGER) {
         has_integers = true;
         break;
       }
@@ -509,22 +501,21 @@ void map_chunked_arrays_to_problem(const cuopt::remote::ChunkedProblemHeader& he
   }
 
   // Variable types + problem category
-  auto var_types_str = get_bytes(cuopt::remote::FIELD_VARIABLE_TYPES);
-  if (!var_types_str.empty()) {
+  auto var_types_ints = get_ints(cuopt::remote::FIELD_VARIABLE_TYPES);
+  if (!var_types_ints.empty()) {
     std::vector<var_t> vtypes;
-    vtypes.reserve(var_types_str.size());
+    vtypes.reserve(var_types_ints.size());
     bool has_ints = false;
-    for (char c : var_types_str) {
-      switch (c) {
-        case 'C': vtypes.push_back(var_t::CONTINUOUS); break;
-        case 'I':
-        case 'B':
+    for (const auto& v : var_types_ints) {
+      switch (static_cast<cuopt::remote::VariableType>(v)) {
+        case cuopt::remote::CONTINUOUS: vtypes.push_back(var_t::CONTINUOUS); break;
+        case cuopt::remote::INTEGER:
           vtypes.push_back(var_t::INTEGER);
           has_ints = true;
           break;
         default:
-          throw std::runtime_error(std::string("Unknown variable type character '") + c +
-                                   "' in chunked variable_types (expected 'C', 'I', or 'B')");
+          throw std::runtime_error("Unknown VariableType enum value " + std::to_string(v) +
+                                   " in chunked variable_types");
       }
     }
     cpu_problem.set_variable_types(vtypes.data(), static_cast<i_t>(vtypes.size()));
@@ -644,19 +635,19 @@ std::vector<cuopt::remote::SendArrayChunkRequest> build_array_chunk_requests(
 
   auto var_types = problem.get_variable_types_host();
   if (!var_types.empty()) {
-    std::vector<uint8_t> vt_bytes;
-    vt_bytes.reserve(var_types.size());
+    std::vector<int32_t> vt_enums;
+    vt_enums.reserve(var_types.size());
     for (const auto& vt : var_types) {
       switch (vt) {
-        case var_t::CONTINUOUS: vt_bytes.push_back('C'); break;
-        case var_t::INTEGER: vt_bytes.push_back('I'); break;
+        case var_t::CONTINUOUS: vt_enums.push_back(cuopt::remote::CONTINUOUS); break;
+        case var_t::INTEGER: vt_enums.push_back(cuopt::remote::INTEGER); break;
         default:
           throw std::runtime_error("chunk_problem_to_proto: unknown var_t value " +
                                    std::to_string(static_cast<int>(vt)));
       }
     }
-    chunk_byte_blob(
-      requests, cuopt::remote::FIELD_VARIABLE_TYPES, vt_bytes, upload_id, chunk_size_bytes);
+    chunk_typed_array(
+      requests, cuopt::remote::FIELD_VARIABLE_TYPES, vt_enums, upload_id, chunk_size_bytes);
   }
 
   if (problem.has_quadratic_objective()) {
diff --git a/cpp/src/grpc/grpc_settings_mapper.cpp b/cpp/src/grpc/grpc_settings_mapper.cpp
index 0c52d766b0..9b503b388e 100644
--- a/cpp/src/grpc/grpc_settings_mapper.cpp
+++ b/cpp/src/grpc/grpc_settings_mapper.cpp
@@ -202,6 +202,33 @@ void map_mip_settings_to_proto(const mip_solver_settings_t<i_t, f_t>& settings,
   pb_settings->set_num_gpus(settings.num_gpus);
   pb_settings->set_presolver(static_cast<int32_t>(settings.presolver));
   pb_settings->set_mip_scaling(settings.mip_scaling);
+
+  // Additional limits
+  pb_settings->set_work_limit(settings.work_limit);
+  if (settings.node_limit == std::numeric_limits<i_t>::max()) {
+    pb_settings->set_node_limit(-1);
+  } else {
+    pb_settings->set_node_limit(static_cast<int32_t>(settings.node_limit));
+  }
+
+  // Branching
+  pb_settings->set_reliability_branching(settings.reliability_branching);
+  pb_settings->set_mip_batch_pdlp_strong_branching(settings.mip_batch_pdlp_strong_branching);
+
+  // Cut configuration
+  pb_settings->set_max_cut_passes(settings.max_cut_passes);
+  pb_settings->set_mir_cuts(settings.mir_cuts);
+  pb_settings->set_mixed_integer_gomory_cuts(settings.mixed_integer_gomory_cuts);
+  pb_settings->set_knapsack_cuts(settings.knapsack_cuts);
+  pb_settings->set_clique_cuts(settings.clique_cuts);
+  pb_settings->set_strong_chvatal_gomory_cuts(settings.strong_chvatal_gomory_cuts);
+  pb_settings->set_reduced_cost_strengthening(settings.reduced_cost_strengthening);
+  pb_settings->set_cut_change_threshold(settings.cut_change_threshold);
+  pb_settings->set_cut_min_orthogonality(settings.cut_min_orthogonality);
+
+  // Determinism and reproducibility
+  pb_settings->set_determinism_mode(settings.determinism_mode);
+  pb_settings->set_seed(settings.seed);
 }
 
 template <typename i_t, typename f_t>
@@ -236,6 +263,31 @@ void map_proto_to_mip_settings(const cuopt::remote::MIPSolverSettings& pb_settin
                              ? sv
                              : CUOPT_MIP_SCALING_ON;
   }
+
+  // Additional limits
+  settings.work_limit = pb_settings.work_limit();
+  if (pb_settings.node_limit() >= 0) {
+    settings.node_limit = static_cast<i_t>(pb_settings.node_limit());
+  }
+
+  // Branching
+  settings.reliability_branching           = pb_settings.reliability_branching();
+  settings.mip_batch_pdlp_strong_branching = pb_settings.mip_batch_pdlp_strong_branching();
+
+  // Cut configuration
+  settings.max_cut_passes             = pb_settings.max_cut_passes();
+  settings.mir_cuts                   = pb_settings.mir_cuts();
+  settings.mixed_integer_gomory_cuts  = pb_settings.mixed_integer_gomory_cuts();
+  settings.knapsack_cuts              = pb_settings.knapsack_cuts();
+  settings.clique_cuts                = pb_settings.clique_cuts();
+  settings.strong_chvatal_gomory_cuts = pb_settings.strong_chvatal_gomory_cuts();
+  settings.reduced_cost_strengthening = pb_settings.reduced_cost_strengthening();
+  settings.cut_change_threshold       = pb_settings.cut_change_threshold();
+  settings.cut_min_orthogonality      = pb_settings.cut_min_orthogonality();
+
+  // Determinism and reproducibility
+  settings.determinism_mode = pb_settings.determinism_mode();
+  settings.seed             = pb_settings.seed();
 }
 
 // Explicit template instantiations
diff --git a/cpp/src/grpc/grpc_solution_mapper.cpp b/cpp/src/grpc/grpc_solution_mapper.cpp
index 096b466804..3be106cdca 100644
--- a/cpp/src/grpc/grpc_solution_mapper.cpp
+++ b/cpp/src/grpc/grpc_solution_mapper.cpp
@@ -84,7 +84,7 @@ template <typename i_t, typename f_t>
 void map_lp_solution_to_proto(const cpu_lp_solution_t<i_t, f_t>& solution,
                               cuopt::remote::LPSolution* pb_solution)
 {
-  pb_solution->set_termination_status(to_proto_pdlp_status(solution.get_termination_status()));
+  pb_solution->set_lp_termination_status(to_proto_pdlp_status(solution.get_termination_status()));
   pb_solution->set_error_message(solution.get_error_status().what());
 
   // Solution vectors - CPU solution already has data in host memory
@@ -157,7 +157,7 @@ cpu_lp_solution_t<i_t, f_t> map_proto_to_lp_solution(const cuopt::remote::LPSolu
   std::vector<f_t> reduced_cost(pb_solution.reduced_cost().begin(),
                                 pb_solution.reduced_cost().end());
 
-  auto status    = from_proto_pdlp_status(pb_solution.termination_status());
+  auto status    = from_proto_pdlp_status(pb_solution.lp_termination_status());
   auto obj       = static_cast<f_t>(pb_solution.primal_objective());
   auto dual_obj  = static_cast<f_t>(pb_solution.dual_objective());
   auto solve_t   = pb_solution.solve_time();
@@ -233,17 +233,17 @@ template <typename i_t, typename f_t>
 void map_mip_solution_to_proto(const cpu_mip_solution_t<i_t, f_t>& solution,
                                cuopt::remote::MIPSolution* pb_solution)
 {
-  pb_solution->set_termination_status(to_proto_mip_status(solution.get_termination_status()));
-  pb_solution->set_error_message(solution.get_error_status().what());
+  pb_solution->set_mip_termination_status(to_proto_mip_status(solution.get_termination_status()));
+  pb_solution->set_mip_error_message(solution.get_error_status().what());
 
   // Solution vector - CPU solution already has data in host memory
   const auto& sol_vec = solution.get_solution_host();
   for (const auto& v : sol_vec) {
-    pb_solution->add_solution(static_cast<double>(v));
+    pb_solution->add_mip_solution(static_cast<double>(v));
   }
 
   // Solution statistics
-  pb_solution->set_objective(solution.get_objective_value());
+  pb_solution->set_mip_objective(solution.get_objective_value());
   pb_solution->set_mip_gap(solution.get_mip_gap());
   pb_solution->set_solution_bound(solution.get_solution_bound());
   pb_solution->set_total_solve_time(solution.get_solve_time());
@@ -260,12 +260,13 @@ cpu_mip_solution_t<i_t, f_t> map_proto_to_mip_solution(
   const cuopt::remote::MIPSolution& pb_solution)
 {
   // Convert solution vector
-  std::vector<f_t> solution_vec(pb_solution.solution().begin(), pb_solution.solution().end());
+  std::vector<f_t> solution_vec(pb_solution.mip_solution().begin(),
+                                pb_solution.mip_solution().end());
 
   // Create CPU MIP solution with data
   return cpu_mip_solution_t<i_t, f_t>(std::move(solution_vec),
-                                      from_proto_mip_status(pb_solution.termination_status()),
-                                      static_cast<f_t>(pb_solution.objective()),
+                                      from_proto_mip_status(pb_solution.mip_termination_status()),
+                                      static_cast<f_t>(pb_solution.mip_objective()),
                                       static_cast<f_t>(pb_solution.mip_gap()),
                                       static_cast<f_t>(pb_solution.solution_bound()),
                                       pb_solution.total_solve_time(),
@@ -344,7 +345,7 @@ template <typename i_t, typename f_t>
 void populate_chunked_result_header_lp(const cpu_lp_solution_t<i_t, f_t>& solution,
                                        cuopt::remote::ChunkedResultHeader* header)
 {
-  header->set_is_mip(false);
+  header->set_problem_category(cuopt::remote::LP);
   header->set_lp_termination_status(to_proto_pdlp_status(solution.get_termination_status()));
   header->set_error_message(solution.get_error_status().what());
   header->set_l2_primal_residual(solution.get_l2_primal_residual());
@@ -416,7 +417,7 @@ template <typename i_t, typename f_t>
 void populate_chunked_result_header_mip(const cpu_mip_solution_t<i_t, f_t>& solution,
                                         cuopt::remote::ChunkedResultHeader* header)
 {
-  header->set_is_mip(true);
+  header->set_problem_category(cuopt::remote::MIP);
   header->set_mip_termination_status(to_proto_mip_status(solution.get_termination_status()));
   header->set_mip_error_message(solution.get_error_status().what());
   header->set_mip_objective(solution.get_objective_value());
diff --git a/cpp/src/grpc/server/grpc_field_element_size.hpp b/cpp/src/grpc/server/grpc_field_element_size.hpp
index f53e99dbf6..dd75d9c66c 100644
--- a/cpp/src/grpc/server/grpc_field_element_size.hpp
+++ b/cpp/src/grpc/server/grpc_field_element_size.hpp
@@ -25,14 +25,14 @@ inline int64_t array_field_element_size(cuopt::remote::ArrayFieldId field_id)
     case cuopt::remote::FIELD_CONSTRAINT_LOWER_BOUNDS:
     case cuopt::remote::FIELD_CONSTRAINT_UPPER_BOUNDS:
     case cuopt::remote::FIELD_Q_VALUES:
-    case cuopt::remote::FIELD_INITIAL_PRIMAL:
-    case cuopt::remote::FIELD_INITIAL_DUAL: return 8;
+    case cuopt::remote::FIELD_INITIAL_PRIMAL_SOLUTION:
+    case cuopt::remote::FIELD_INITIAL_DUAL_SOLUTION: return 8;
     case cuopt::remote::FIELD_A_INDICES:
     case cuopt::remote::FIELD_A_OFFSETS:
     case cuopt::remote::FIELD_Q_INDICES:
-    case cuopt::remote::FIELD_Q_OFFSETS: return 4;
+    case cuopt::remote::FIELD_Q_OFFSETS:
+    case cuopt::remote::FIELD_VARIABLE_TYPES: return 4;
     case cuopt::remote::FIELD_ROW_TYPES:
-    case cuopt::remote::FIELD_VARIABLE_TYPES:
     case cuopt::remote::FIELD_VARIABLE_NAMES:
     case cuopt::remote::FIELD_ROW_NAMES: return 1;
   }
diff --git a/cpp/src/grpc/server/grpc_server_main.cpp b/cpp/src/grpc/server/grpc_server_main.cpp
index 5cc947a81a..d638c191b1 100644
--- a/cpp/src/grpc/server/grpc_server_main.cpp
+++ b/cpp/src/grpc/server/grpc_server_main.cpp
@@ -65,7 +65,10 @@ int main(int argc, char** argv)
 
   argparse::ArgumentParser program("cuopt_grpc_server", version_string);
 
-  program.add_argument("-p", "--port").help("Listen port").default_value(8765).scan<'i', int>();
+  program.add_argument("-p", "--port")
+    .help("Listen port")
+    .default_value(cuopt_default_grpc_port)
+    .scan<'i', int>();
 
   program.add_argument("-w", "--workers")
     .help("Number of worker processes")
@@ -86,11 +89,6 @@ int main(int argc, char** argv)
     .default_value(60)
     .scan<'i', int>();
 
-  program.add_argument("--enable-transfer-hash")
-    .help("Log data hashes for streaming transfers (for testing)")
-    .default_value(false)
-    .implicit_value(true);
-
   program.add_argument("--tls")
     .help("Enable TLS (requires --tls-cert and --tls-key)")
     .default_value(false)
@@ -144,7 +142,6 @@ int main(int argc, char** argv)
   }
 
   config.chunk_timeout_seconds = program.get<int>("--chunk-timeout");
-  config.enable_transfer_hash  = program.get<bool>("--enable-transfer-hash");
   config.enable_tls            = program.get<bool>("--tls");
   config.require_client        = program.get<bool>("--require-client-cert");
   config.log_to_console        = program.get<bool>("--log-to-console");
diff --git a/cpp/src/grpc/server/grpc_server_types.hpp b/cpp/src/grpc/server/grpc_server_types.hpp
index 7afc668fb9..dc6684dea5 100644
--- a/cpp/src/grpc/server/grpc_server_types.hpp
+++ b/cpp/src/grpc/server/grpc_server_types.hpp
@@ -7,6 +7,8 @@
 
 #ifdef CUOPT_ENABLE_GRPC
 
+#include "../cuopt_default_grpc_port.h"
+
 #include <grpcpp/grpcpp.h>
 #include "cuopt_remote.pb.h"
 #include "cuopt_remote_service.grpc.pb.h"
@@ -156,7 +158,7 @@ struct JobWaiter {
 // =============================================================================
 
 struct ServerConfig {
-  int port            = 8765;
+  int port            = cuopt_default_grpc_port;
   int num_workers     = 1;
   bool verbose        = true;
   bool log_to_console = false;
@@ -165,7 +167,6 @@ struct ServerConfig {
   // Clamped at startup to [kServerMinMessageBytes, kServerMaxMessageBytes].
   int64_t max_message_bytes = 256LL * 1024 * 1024;  // 256 MiB
   int chunk_timeout_seconds = 60;                   // 0 = disabled
-  bool enable_transfer_hash = false;
   bool enable_tls           = false;
   bool require_client       = false;
   std::string tls_cert_path;
diff --git a/cpp/tests/linear_programming/grpc/grpc_client_test.cpp b/cpp/tests/linear_programming/grpc/grpc_client_test.cpp
index 46a18dc026..ef97f4a414 100644
--- a/cpp/tests/linear_programming/grpc/grpc_client_test.cpp
+++ b/cpp/tests/linear_programming/grpc/grpc_client_test.cpp
@@ -17,12 +17,16 @@
 #include "grpc_client_test_helper.hpp"
 
 #include <cuopt/linear_programming/cpu_optimization_problem.hpp>
+#include <cuopt/linear_programming/cpu_optimization_problem_solution.hpp>
 #include <cuopt/linear_programming/mip/solver_settings.hpp>
 #include <cuopt/linear_programming/optimization_problem_interface.hpp>
 #include <cuopt/linear_programming/optimization_problem_utils.hpp>
 #include <cuopt/linear_programming/pdlp/solver_settings.hpp>
 #include "grpc_client.hpp"
+#include "grpc_problem_mapper.hpp"
 #include "grpc_service_mapper.hpp"
+#include "grpc_settings_mapper.hpp"
+#include "grpc_solution_mapper.hpp"
 
 #include <cuopt_remote.pb.h>
 #include <cuopt_remote_service.grpc.pb.h>
@@ -834,7 +838,7 @@ TEST_F(GrpcClientTest, ChunkedDownload_FallbackOnResourceExhausted)
                  cuopt::remote::StartChunkedDownloadResponse* resp) {
       resp->set_download_id("dl-001");
       auto* h = resp->mutable_header();
-      h->set_is_mip(false);
+      h->set_problem_category(cuopt::remote::LP);
       h->set_lp_termination_status(cuopt::remote::PDLP_OPTIMAL);
       h->set_primal_objective(-464.753);
       auto* arr = h->add_arrays();
@@ -1124,7 +1128,7 @@ TEST_F(GrpcClientTest, SolveLP_SuccessWithPolling)
       cuopt::remote::LPSolution solution;
       solution.add_primal_solution(1.0);
       solution.set_primal_objective(1.0);
-      solution.set_termination_status(cuopt::remote::PDLP_OPTIMAL);
+      solution.set_lp_termination_status(cuopt::remote::PDLP_OPTIMAL);
       resp->mutable_lp_solution()->CopyFrom(solution);
       resp->set_status(cuopt::remote::SUCCESS);
       return grpc::Status::OK;
@@ -1172,7 +1176,7 @@ TEST_F(GrpcClientTest, SolveLP_SuccessWithWait)
       cuopt::remote::LPSolution solution;
       solution.add_primal_solution(1.0);
       solution.set_primal_objective(1.0);
-      solution.set_termination_status(cuopt::remote::PDLP_OPTIMAL);
+      solution.set_lp_termination_status(cuopt::remote::PDLP_OPTIMAL);
       resp->mutable_lp_solution()->CopyFrom(solution);
       resp->set_status(cuopt::remote::SUCCESS);
       return grpc::Status::OK;
@@ -1305,9 +1309,9 @@ TEST_F(GrpcClientTest, SolveMIP_Success)
                  const cuopt::remote::GetResultRequest&,
                  cuopt::remote::ResultResponse* resp) {
       cuopt::remote::MIPSolution solution;
-      solution.add_solution(1.0);
-      solution.set_objective(1.0);
-      solution.set_termination_status(cuopt::remote::MIP_OPTIMAL);
+      solution.add_mip_solution(1.0);
+      solution.set_mip_objective(1.0);
+      solution.set_mip_termination_status(cuopt::remote::MIP_OPTIMAL);
       resp->mutable_mip_solution()->CopyFrom(solution);
       resp->set_status(cuopt::remote::SUCCESS);
       return grpc::Status::OK;
@@ -1632,3 +1636,347 @@ TEST_F(GrpcClientTest, SubmitLP_UnaryForSmallPayload)
   EXPECT_TRUE(result.success) << "Error: " << result.error_message;
   EXPECT_EQ(result.job_id, "unary-lp-001");
 }
+
+// =============================================================================
+// Mapper Roundtrip Tests
+// =============================================================================
+
+TEST(MapperRoundtrip, MIPSettingsAllFields)
+{
+  mip_solver_settings_t<int32_t, double> orig;
+
+  // Limits
+  orig.time_limit = 42.5;
+  orig.work_limit = 1000.0;
+  orig.node_limit = 5000;
+
+  // Tolerances
+  orig.tolerances.relative_mip_gap            = 1e-3;
+  orig.tolerances.absolute_mip_gap            = 1e-8;
+  orig.tolerances.integrality_tolerance       = 1e-4;
+  orig.tolerances.absolute_tolerance          = 2e-6;
+  orig.tolerances.relative_tolerance          = 3e-12;
+  orig.tolerances.presolve_absolute_tolerance = 5e-7;
+
+  // Solver configuration
+  orig.log_to_console  = false;
+  orig.heuristics_only = true;
+  orig.num_cpu_threads = 8;
+  orig.num_gpus        = 2;
+  orig.presolver       = presolver_t::Default;
+  orig.mip_scaling     = true;
+
+  // Branching
+  orig.reliability_branching           = 32;
+  orig.mip_batch_pdlp_strong_branching = 16;
+
+  // Cut configuration
+  orig.max_cut_passes             = 20;
+  orig.mir_cuts                   = 1;
+  orig.mixed_integer_gomory_cuts  = 2;
+  orig.knapsack_cuts              = 0;
+  orig.clique_cuts                = 3;
+  orig.strong_chvatal_gomory_cuts = -1;
+  orig.reduced_cost_strengthening = 1;
+  orig.cut_change_threshold       = 0.05;
+  orig.cut_min_orthogonality      = 0.8;
+
+  // Determinism and reproducibility
+  orig.determinism_mode = CUOPT_MODE_DETERMINISTIC;
+  orig.seed             = 12345;
+
+  // Roundtrip: C++ -> proto -> C++
+  cuopt::remote::MIPSolverSettings pb;
+  map_mip_settings_to_proto(orig, &pb);
+
+  mip_solver_settings_t<int32_t, double> restored;
+  map_proto_to_mip_settings(pb, restored);
+
+  // Limits
+  EXPECT_DOUBLE_EQ(restored.time_limit, 42.5);
+  EXPECT_DOUBLE_EQ(restored.work_limit, 1000.0);
+  EXPECT_EQ(restored.node_limit, 5000);
+
+  // Tolerances
+  EXPECT_DOUBLE_EQ(restored.tolerances.relative_mip_gap, 1e-3);
+  EXPECT_DOUBLE_EQ(restored.tolerances.absolute_mip_gap, 1e-8);
+  EXPECT_DOUBLE_EQ(restored.tolerances.integrality_tolerance, 1e-4);
+  EXPECT_DOUBLE_EQ(restored.tolerances.absolute_tolerance, 2e-6);
+  EXPECT_DOUBLE_EQ(restored.tolerances.relative_tolerance, 3e-12);
+  EXPECT_DOUBLE_EQ(restored.tolerances.presolve_absolute_tolerance, 5e-7);
+
+  // Solver configuration
+  EXPECT_EQ(restored.log_to_console, false);
+  EXPECT_EQ(restored.heuristics_only, true);
+  EXPECT_EQ(restored.num_cpu_threads, 8);
+  EXPECT_EQ(restored.num_gpus, 2);
+  EXPECT_EQ(restored.presolver, presolver_t::Default);
+  EXPECT_EQ(restored.mip_scaling, true);
+
+  // Branching
+  EXPECT_EQ(restored.reliability_branching, 32);
+  EXPECT_EQ(restored.mip_batch_pdlp_strong_branching, 16);
+
+  // Cut configuration
+  EXPECT_EQ(restored.max_cut_passes, 20);
+  EXPECT_EQ(restored.mir_cuts, 1);
+  EXPECT_EQ(restored.mixed_integer_gomory_cuts, 2);
+  EXPECT_EQ(restored.knapsack_cuts, 0);
+  EXPECT_EQ(restored.clique_cuts, 3);
+  EXPECT_EQ(restored.strong_chvatal_gomory_cuts, -1);
+  EXPECT_EQ(restored.reduced_cost_strengthening, 1);
+  EXPECT_DOUBLE_EQ(restored.cut_change_threshold, 0.05);
+  EXPECT_DOUBLE_EQ(restored.cut_min_orthogonality, 0.8);
+
+  // Determinism and reproducibility
+  EXPECT_EQ(restored.determinism_mode, CUOPT_MODE_DETERMINISTIC);
+  EXPECT_EQ(restored.seed, 12345);
+}
+
+TEST(MapperRoundtrip, MIPSettingsNodeLimitSentinel)
+{
+  mip_solver_settings_t<int32_t, double> orig;
+  orig.node_limit = std::numeric_limits<int32_t>::max();
+
+  cuopt::remote::MIPSolverSettings pb;
+  map_mip_settings_to_proto(orig, &pb);
+  EXPECT_EQ(pb.node_limit(), -1) << "max() should map to -1 sentinel in proto";
+
+  mip_solver_settings_t<int32_t, double> restored;
+  restored.node_limit = 0;
+  map_proto_to_mip_settings(pb, restored);
+  EXPECT_EQ(restored.node_limit, 0) << "Negative sentinel should leave node_limit unchanged";
+}
+
+TEST(MapperRoundtrip, ProblemWithVariableTypes)
+{
+  cpu_optimization_problem_t<int32_t, double> orig;
+
+  std::vector<double> obj    = {1.0, 2.0, 3.0};
+  std::vector<double> var_lb = {0.0, 0.0, 0.0};
+  std::vector<double> var_ub = {10.0, 10.0, 10.0};
+  std::vector<var_t> var_ty  = {var_t::CONTINUOUS, var_t::INTEGER, var_t::CONTINUOUS};
+  std::vector<double> con_lb = {1.0};
+  std::vector<double> con_ub = {1e20};
+  std::vector<double> A_vals = {1.0, 1.0, 1.0};
+  std::vector<int32_t> A_idx = {0, 1, 2};
+  std::vector<int32_t> A_off = {0, 3};
+
+  orig.set_objective_coefficients(obj.data(), 3);
+  orig.set_maximize(true);
+  orig.set_variable_lower_bounds(var_lb.data(), 3);
+  orig.set_variable_upper_bounds(var_ub.data(), 3);
+  orig.set_variable_types(var_ty.data(), 3);
+  orig.set_csr_constraint_matrix(A_vals.data(), 3, A_idx.data(), 3, A_off.data(), 2);
+  orig.set_constraint_lower_bounds(con_lb.data(), 1);
+  orig.set_constraint_upper_bounds(con_ub.data(), 1);
+
+  cuopt::remote::OptimizationProblem pb;
+  map_problem_to_proto(orig, &pb);
+
+  ASSERT_EQ(pb.variable_types_size(), 3);
+  EXPECT_EQ(pb.variable_types(0), cuopt::remote::CONTINUOUS);
+  EXPECT_EQ(pb.variable_types(1), cuopt::remote::INTEGER);
+  EXPECT_EQ(pb.variable_types(2), cuopt::remote::CONTINUOUS);
+
+  cpu_optimization_problem_t<int32_t, double> restored;
+  map_proto_to_problem(pb, restored);
+
+  auto restored_types = restored.get_variable_types_host();
+  ASSERT_EQ(restored_types.size(), 3u);
+  EXPECT_EQ(restored_types[0], var_t::CONTINUOUS);
+  EXPECT_EQ(restored_types[1], var_t::INTEGER);
+  EXPECT_EQ(restored_types[2], var_t::CONTINUOUS);
+
+  EXPECT_EQ(restored.get_sense(), true);
+  auto restored_obj = restored.get_objective_coefficients_host();
+  ASSERT_EQ(restored_obj.size(), 3u);
+  EXPECT_DOUBLE_EQ(restored_obj[0], 1.0);
+  EXPECT_DOUBLE_EQ(restored_obj[1], 2.0);
+  EXPECT_DOUBLE_EQ(restored_obj[2], 3.0);
+}
+
+TEST(MapperRoundtrip, MIPSolutionAllFields)
+{
+  std::vector<double> sol_vec = {1.0, 0.0, 1.0, 0.0, 1.0};
+
+  cpu_mip_solution_t<int32_t, double> orig(std::move(sol_vec),
+                                           mip_termination_status_t::FeasibleFound,
+                                           42.5,    // objective
+                                           0.015,   // mip_gap
+                                           40.0,    // solution_bound
+                                           12.34,   // total_solve_time
+                                           0.56,    // presolve_time
+                                           1e-8,    // max_constraint_violation
+                                           1e-9,    // max_int_violation
+                                           1e-10,   // max_variable_bound_violation
+                                           1234,    // num_nodes
+                                           56789);  // num_simplex_iterations
+
+  cuopt::remote::MIPSolution pb;
+  map_mip_solution_to_proto(orig, &pb);
+
+  EXPECT_EQ(pb.mip_termination_status(), cuopt::remote::MIP_FEASIBLE_FOUND);
+  EXPECT_EQ(pb.mip_solution_size(), 5);
+  EXPECT_DOUBLE_EQ(pb.mip_objective(), 42.5);
+  EXPECT_DOUBLE_EQ(pb.mip_gap(), 0.015);
+
+  auto restored = map_proto_to_mip_solution<int32_t, double>(pb);
+
+  EXPECT_EQ(restored.get_termination_status(), mip_termination_status_t::FeasibleFound);
+  EXPECT_DOUBLE_EQ(restored.get_objective_value(), 42.5);
+  EXPECT_DOUBLE_EQ(restored.get_mip_gap(), 0.015);
+  EXPECT_DOUBLE_EQ(restored.get_solution_bound(), 40.0);
+  EXPECT_DOUBLE_EQ(restored.get_solve_time(), 12.34);
+  EXPECT_DOUBLE_EQ(restored.get_presolve_time(), 0.56);
+  EXPECT_DOUBLE_EQ(restored.get_max_constraint_violation(), 1e-8);
+  EXPECT_DOUBLE_EQ(restored.get_max_int_violation(), 1e-9);
+  EXPECT_DOUBLE_EQ(restored.get_max_variable_bound_violation(), 1e-10);
+  EXPECT_EQ(restored.get_num_nodes(), 1234);
+  EXPECT_EQ(restored.get_num_simplex_iterations(), 56789);
+
+  auto restored_sol = restored.get_solution_host();
+  ASSERT_EQ(restored_sol.size(), 5u);
+  EXPECT_DOUBLE_EQ(restored_sol[0], 1.0);
+  EXPECT_DOUBLE_EQ(restored_sol[1], 0.0);
+  EXPECT_DOUBLE_EQ(restored_sol[4], 1.0);
+}
+
+TEST(MapperRoundtrip, LPSolutionAllFields)
+{
+  std::vector<double> primal       = {1.5, 2.5, 3.5};
+  std::vector<double> dual         = {0.1, 0.2};
+  std::vector<double> reduced_cost = {0.0, 0.0, 0.5};
+
+  cpu_lp_solution_t<int32_t, double> orig(std::move(primal),
+                                          std::move(dual),
+                                          std::move(reduced_cost),
+                                          pdlp_termination_status_t::Optimal,
+                                          -464.753,         // primal_objective
+                                          -464.0,           // dual_objective
+                                          1.23,             // solve_time
+                                          1e-8,             // l2_primal_residual
+                                          2e-8,             // l2_dual_residual
+                                          3e-8,             // gap
+                                          500,              // num_iterations
+                                          method_t::PDLP);  // solved_by
+
+  cuopt::remote::LPSolution pb;
+  map_lp_solution_to_proto(orig, &pb);
+
+  EXPECT_EQ(pb.lp_termination_status(), cuopt::remote::PDLP_OPTIMAL);
+  EXPECT_EQ(pb.primal_solution_size(), 3);
+  EXPECT_EQ(pb.dual_solution_size(), 2);
+  EXPECT_EQ(pb.reduced_cost_size(), 3);
+
+  auto restored = map_proto_to_lp_solution<int32_t, double>(pb);
+
+  EXPECT_EQ(restored.get_termination_status(), pdlp_termination_status_t::Optimal);
+  EXPECT_NEAR(restored.get_objective_value(), -464.753, 1e-6);
+  EXPECT_NEAR(restored.get_dual_objective_value(), -464.0, 1e-6);
+  EXPECT_DOUBLE_EQ(restored.get_solve_time(), 1.23);
+  EXPECT_DOUBLE_EQ(restored.get_l2_primal_residual(), 1e-8);
+  EXPECT_DOUBLE_EQ(restored.get_l2_dual_residual(), 2e-8);
+  EXPECT_DOUBLE_EQ(restored.get_gap(), 3e-8);
+  EXPECT_EQ(restored.get_num_iterations(), 500);
+  EXPECT_EQ(restored.solved_by(), method_t::PDLP);
+
+  auto restored_primal = restored.get_primal_solution_host();
+  ASSERT_EQ(restored_primal.size(), 3u);
+  EXPECT_DOUBLE_EQ(restored_primal[0], 1.5);
+  EXPECT_DOUBLE_EQ(restored_primal[2], 3.5);
+
+  auto restored_dual = restored.get_dual_solution_host();
+  ASSERT_EQ(restored_dual.size(), 2u);
+  EXPECT_DOUBLE_EQ(restored_dual[0], 0.1);
+}
+
+TEST(MapperRoundtrip, PDLPSettingsAllFields)
+{
+  pdlp_solver_settings_t<int32_t, double> orig;
+
+  orig.tolerances.absolute_gap_tolerance      = 1e-7;
+  orig.tolerances.relative_gap_tolerance      = 1e-6;
+  orig.tolerances.primal_infeasible_tolerance = 1e-5;
+  orig.tolerances.dual_infeasible_tolerance   = 2e-5;
+  orig.tolerances.absolute_dual_tolerance     = 3e-7;
+  orig.tolerances.relative_dual_tolerance     = 4e-7;
+  orig.tolerances.absolute_primal_tolerance   = 5e-7;
+  orig.tolerances.relative_primal_tolerance   = 6e-7;
+
+  orig.time_limit                 = 99.5;
+  orig.iteration_limit            = 10000;
+  orig.log_to_console             = false;
+  orig.detect_infeasibility       = true;
+  orig.strict_infeasibility       = true;
+  orig.pdlp_solver_mode           = pdlp_solver_mode_t::Fast1;
+  orig.method                     = method_t::Barrier;
+  orig.presolver                  = presolver_t::Default;
+  orig.dual_postsolve             = true;
+  orig.crossover                  = true;
+  orig.num_gpus                   = 4;
+  orig.per_constraint_residual    = true;
+  orig.cudss_deterministic        = true;
+  orig.folding                    = 1;
+  orig.augmented                  = 1;
+  orig.dualize                    = 1;
+  orig.ordering                   = 2;
+  orig.barrier_dual_initial_point = 1;
+  orig.eliminate_dense_columns    = true;
+  orig.pdlp_precision             = pdlp_precision_t::MixedPrecision;
+  orig.save_best_primal_so_far    = true;
+  orig.first_primal_feasible      = true;
+
+  cuopt::remote::PDLPSolverSettings pb;
+  map_pdlp_settings_to_proto(orig, &pb);
+
+  pdlp_solver_settings_t<int32_t, double> restored;
+  map_proto_to_pdlp_settings(pb, restored);
+
+  EXPECT_DOUBLE_EQ(restored.tolerances.absolute_gap_tolerance, 1e-7);
+  EXPECT_DOUBLE_EQ(restored.tolerances.relative_gap_tolerance, 1e-6);
+  EXPECT_DOUBLE_EQ(restored.tolerances.primal_infeasible_tolerance, 1e-5);
+  EXPECT_DOUBLE_EQ(restored.tolerances.dual_infeasible_tolerance, 2e-5);
+  EXPECT_DOUBLE_EQ(restored.tolerances.absolute_dual_tolerance, 3e-7);
+  EXPECT_DOUBLE_EQ(restored.tolerances.relative_dual_tolerance, 4e-7);
+  EXPECT_DOUBLE_EQ(restored.tolerances.absolute_primal_tolerance, 5e-7);
+  EXPECT_DOUBLE_EQ(restored.tolerances.relative_primal_tolerance, 6e-7);
+
+  EXPECT_DOUBLE_EQ(restored.time_limit, 99.5);
+  EXPECT_EQ(restored.iteration_limit, 10000);
+  EXPECT_EQ(restored.log_to_console, false);
+  EXPECT_EQ(restored.detect_infeasibility, true);
+  EXPECT_EQ(restored.strict_infeasibility, true);
+  EXPECT_EQ(restored.pdlp_solver_mode, pdlp_solver_mode_t::Fast1);
+  EXPECT_EQ(restored.method, method_t::Barrier);
+  EXPECT_EQ(restored.presolver, presolver_t::Default);
+  EXPECT_EQ(restored.dual_postsolve, true);
+  EXPECT_EQ(restored.crossover, true);
+  EXPECT_EQ(restored.num_gpus, 4);
+  EXPECT_EQ(restored.per_constraint_residual, true);
+  EXPECT_EQ(restored.cudss_deterministic, true);
+  EXPECT_EQ(restored.folding, 1);
+  EXPECT_EQ(restored.augmented, 1);
+  EXPECT_EQ(restored.dualize, 1);
+  EXPECT_EQ(restored.ordering, 2);
+  EXPECT_EQ(restored.barrier_dual_initial_point, 1);
+  EXPECT_EQ(restored.eliminate_dense_columns, true);
+  EXPECT_EQ(restored.pdlp_precision, pdlp_precision_t::MixedPrecision);
+  EXPECT_EQ(restored.save_best_primal_so_far, true);
+  EXPECT_EQ(restored.first_primal_feasible, true);
+}
+
+TEST(MapperRoundtrip, PDLPSettingsIterationLimitSentinel)
+{
+  pdlp_solver_settings_t<int32_t, double> orig;
+  orig.iteration_limit = std::numeric_limits<int32_t>::max();
+
+  cuopt::remote::PDLPSolverSettings pb;
+  map_pdlp_settings_to_proto(orig, &pb);
+  EXPECT_EQ(pb.iteration_limit(), -1) << "max() should map to -1 sentinel";
+
+  pdlp_solver_settings_t<int32_t, double> restored;
+  auto default_limit = restored.iteration_limit;
+  map_proto_to_pdlp_settings(pb, restored);
+  EXPECT_EQ(restored.iteration_limit, default_limit) << "Negative sentinel should keep default";
+}
diff --git a/cpp/tests/linear_programming/grpc/grpc_integration_test.cpp b/cpp/tests/linear_programming/grpc/grpc_integration_test.cpp
index fe1db25490..b86e2d41b9 100644
--- a/cpp/tests/linear_programming/grpc/grpc_integration_test.cpp
+++ b/cpp/tests/linear_programming/grpc/grpc_integration_test.cpp
@@ -355,8 +355,6 @@ class GrpcIntegrationTestBase : public ::testing::Test {
 
     if (config.timeout_seconds == 3600) { config.timeout_seconds = 60; }
 
-    config.enable_transfer_hash = true;
-
     auto client = std::make_unique<grpc_client_t>(config);
     if (!client->connect()) { return nullptr; }
     return client;
@@ -519,7 +517,7 @@ class DefaultServerTests : public GrpcIntegrationTestBase {
   {
     s_port_   = get_test_port();
     s_server_ = std::make_unique<ServerProcess>();
-    ASSERT_TRUE(s_server_->start(s_port_, {"--enable-transfer-hash"}))
+    ASSERT_TRUE(s_server_->start(s_port_, {}))
       << "Failed to start shared default server on port " << s_port_;
   }
 
@@ -1586,8 +1584,7 @@ class TlsServerTests : public GrpcIntegrationTestBase {
                                      "--tls-key",
                                      g_tls_certs_dir + "/server.key",
                                      "--tls-root",
-                                     g_tls_certs_dir + "/ca.crt",
-                                     "--enable-transfer-hash"};
+                                     g_tls_certs_dir + "/ca.crt"};
 
     if (!s_server_->start(s_port_, args)) {
       s_server_.reset();
@@ -1688,8 +1685,7 @@ class MtlsServerTests : public GrpcIntegrationTestBase {
                                      g_tls_certs_dir + "/server.key",
                                      "--tls-root",
                                      g_tls_certs_dir + "/ca.crt",
-                                     "--require-client-cert",
-                                     "--enable-transfer-hash"};
+                                     "--require-client-cert"};
 
     if (!s_server_->start(s_port_, args)) {
       s_server_.reset();
diff --git a/cpp/tests/linear_programming/grpc/grpc_pipe_serialization_test.cpp b/cpp/tests/linear_programming/grpc/grpc_pipe_serialization_test.cpp
index cf237d6119..632f9ce8d2 100644
--- a/cpp/tests/linear_programming/grpc/grpc_pipe_serialization_test.cpp
+++ b/cpp/tests/linear_programming/grpc/grpc_pipe_serialization_test.cpp
@@ -294,7 +294,7 @@ TEST(PipeSerialization, Result_RoundTrip)
   PipePair pp;
 
   ChunkedResultHeader header;
-  header.set_is_mip(false);
+  header.set_problem_category(cuopt::remote::LP);
   header.set_lp_termination_status(PDLP_OPTIMAL);
   header.set_primal_objective(42.5);
   header.set_solve_time(1.23);
@@ -319,7 +319,7 @@ TEST(PipeSerialization, Result_RoundTrip)
   ASSERT_TRUE(write_ok);
   ASSERT_TRUE(read_ok);
 
-  EXPECT_FALSE(header_out.is_mip());
+  EXPECT_EQ(header_out.problem_category(), cuopt::remote::LP);
   EXPECT_EQ(header_out.lp_termination_status(), PDLP_OPTIMAL);
   EXPECT_DOUBLE_EQ(header_out.primal_objective(), 42.5);
   EXPECT_DOUBLE_EQ(header_out.solve_time(), 1.23);
@@ -334,11 +334,11 @@ TEST(PipeSerialization, Result_MIPFields)
   PipePair pp;
 
   ChunkedResultHeader header;
-  header.set_is_mip(true);
+  header.set_problem_category(cuopt::remote::MIP);
   header.set_mip_termination_status(MIP_OPTIMAL);
   header.set_mip_objective(99.0);
   header.set_mip_gap(0.001);
-  header.set_error_message("");
+  header.set_mip_error_message("");
 
   auto solution = make_pattern(2000 * 8, 0x33);
   std::map<int32_t, std::vector<uint8_t>> arrays;
@@ -356,7 +356,7 @@ TEST(PipeSerialization, Result_MIPFields)
   ASSERT_TRUE(write_ok);
   ASSERT_TRUE(read_ok);
 
-  EXPECT_TRUE(header_out.is_mip());
+  EXPECT_EQ(header_out.problem_category(), cuopt::remote::MIP);
   EXPECT_EQ(header_out.mip_termination_status(), MIP_OPTIMAL);
   EXPECT_DOUBLE_EQ(header_out.mip_objective(), 99.0);
 
@@ -369,7 +369,7 @@ TEST(PipeSerialization, Result_EmptyArrays)
   PipePair pp;
 
   ChunkedResultHeader header;
-  header.set_is_mip(false);
+  header.set_problem_category(cuopt::remote::LP);
   header.set_error_message("solver failed");
 
   std::map<int32_t, std::vector<uint8_t>> arrays;  // no arrays (error case)
@@ -398,7 +398,7 @@ TEST(PipeSerialization, ProtobufRoundTrip)
   PipePair pp;
 
   ChunkedResultHeader msg;
-  msg.set_is_mip(true);
+  msg.set_problem_category(cuopt::remote::MIP);
   msg.set_primal_objective(3.14);
   msg.set_error_message("hello");
 
@@ -412,7 +412,7 @@ TEST(PipeSerialization, ProtobufRoundTrip)
 
   ASSERT_TRUE(write_ok);
   ASSERT_TRUE(read_ok);
-  EXPECT_TRUE(msg_out.is_mip());
+  EXPECT_EQ(msg_out.problem_category(), cuopt::remote::MIP);
   EXPECT_DOUBLE_EQ(msg_out.primal_objective(), 3.14);
   EXPECT_EQ(msg_out.error_message(), "hello");
 }
@@ -426,7 +426,7 @@ TEST(PipeSerialization, Result_LargeArray)
   PipePair pp;
 
   ChunkedResultHeader header;
-  header.set_is_mip(false);
+  header.set_problem_category(cuopt::remote::LP);
   header.set_primal_objective(0.0);
 
   // ~4 MiB array — large enough to require many kernel-level pipe iterations.
diff --git a/docs/cuopt/source/_static/install-selector.js b/docs/cuopt/source/_static/install-selector.js
index d0d309b897..0f2c4ccf44 100644
--- a/docs/cuopt/source/_static/install-selector.js
+++ b/docs/cuopt/source/_static/install-selector.js
@@ -36,6 +36,30 @@
   var V_CONDA_NEXT = nextMajor + "." + (nextMinor < 10 ? "0" : "") + nextMinor;
   var V_NEXT = nextMajor + "." + nextMinor;
 
+  /* Shared Docker image lines: same tags are typically published to Docker Hub and NGC */
+  var CONTAINER_CUOPT_LIB = {
+    stable: {
+      cu12: {
+        default: "docker pull nvidia/cuopt:latest-cuda12.9-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda12.9-py3.13 /bin/bash",
+      },
+      cu13: {
+        default: "docker pull nvidia/cuopt:latest-cuda13.0-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda13.0-py3.13 /bin/bash",
+      },
+    },
+    nightly: {
+      cu12: {
+        default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13 /bin/bash",
+      },
+      cu13: {
+        default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13 /bin/bash",
+      },
+    },
+  };
+
   var COMMANDS = {
     python: {
       pip: {
@@ -82,48 +106,27 @@
             ".* cuda-version=13.0",
         },
       },
-      container: {
-        stable: {
-          cu12: {
-            default: "docker pull nvidia/cuopt:latest-cuda12.9-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda12.9-py3.13 /bin/bash",
-          },
-          cu13: {
-            default: "docker pull nvidia/cuopt:latest-cuda13.0-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda13.0-py3.13 /bin/bash",
-          },
-        },
-        nightly: {
-          cu12: {
-            default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13 /bin/bash",
-          },
-          cu13: {
-            default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13 /bin/bash",
-          },
-        },
-      },
+      container: CONTAINER_CUOPT_LIB,
     },
     c: {
       pip: {
         stable: {
           cu12:
-            "pip uninstall -y cuopt-thin-client 2>/dev/null; pip install --extra-index-url=https://pypi.nvidia.com 'libcuopt-cu12==" +
+            "pip install --extra-index-url=https://pypi.nvidia.com 'libcuopt-cu12==" +
             V +
             ".*'",
           cu13:
-            "pip uninstall -y cuopt-thin-client 2>/dev/null; pip install --extra-index-url=https://pypi.nvidia.com 'libcuopt-cu13==" +
+            "pip install --extra-index-url=https://pypi.nvidia.com 'libcuopt-cu13==" +
             V +
             ".*'",
         },
         nightly: {
           cu12:
-            "pip uninstall -y cuopt-thin-client 2>/dev/null; pip install --pre --extra-index-url=https://pypi.nvidia.com --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ 'libcuopt-cu12==" +
+            "pip install --pre --extra-index-url=https://pypi.nvidia.com --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ 'libcuopt-cu12==" +
             V_NEXT +
             ".*'",
           cu13:
-            "pip uninstall -y cuopt-thin-client 2>/dev/null; pip install --pre --extra-index-url=https://pypi.nvidia.com --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ 'libcuopt-cu13==" +
+            "pip install --pre --extra-index-url=https://pypi.nvidia.com --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ 'libcuopt-cu13==" +
             V_NEXT +
             ".*'",
         },
@@ -131,11 +134,11 @@
       conda: {
         stable: {
           cu12:
-            "conda remove cuopt-thin-client --yes 2>/dev/null; conda install -c rapidsai -c conda-forge -c nvidia libcuopt=" +
+            "conda install -c rapidsai -c conda-forge -c nvidia libcuopt=" +
             V_CONDA +
             ".* cuda-version=12.9",
           cu13:
-            "conda remove cuopt-thin-client --yes 2>/dev/null; conda install -c rapidsai -c conda-forge -c nvidia libcuopt=" +
+            "conda install -c rapidsai -c conda-forge -c nvidia libcuopt=" +
             V_CONDA +
             ".* cuda-version=13.0",
         },
@@ -150,7 +153,7 @@
             ".* cuda-version=13.0",
         },
       },
-      container: null,
+      container: CONTAINER_CUOPT_LIB,
     },
     server: {
       pip: {
@@ -228,9 +231,9 @@
 
   var SUPPORTED_METHODS = {
     python: ["pip", "conda", "container"],
-    c: ["pip", "conda"],
+    c: ["pip", "conda", "container"],
     server: ["pip", "conda", "container"],
-    cli: ["pip", "conda"],
+    cli: ["pip", "conda", "container"],
   };
 
   function getSelectedValue(name) {
@@ -264,7 +267,41 @@
     if (method === "container") {
       var cudaKey = cuda || "cu12";
       var c = data[release][cudaKey] || data[release].cu12;
-      cmd = c.default + "\n\n# Run the container:\n" + c.run;
+      var hubPull = c.default;
+      var tag = "latest-cuda12.9-py3.13";
+      var tm = hubPull.match(/docker pull nvidia\/cuopt:(\S+)/);
+      if (tm) tag = tm[1];
+      var registry = getSelectedValue("cuopt-registry") || "hub";
+      var runLine = c.run;
+      if (registry === "ngc" && release === "nightly") {
+        cmd =
+          "# Nightly cuOpt container images are not published to NVIDIA NGC; use Docker Hub for nightly builds.\n" +
+          "# (Select \"Docker Hub\" above for the same commands without this note.)\n\n" +
+          "# Docker Hub (docker.io) — no registry login required for public pulls\n" +
+          hubPull +
+          "\n\n" +
+          "# Run the container:\n" +
+          runLine;
+      } else if (registry === "ngc") {
+        runLine = runLine.replace(/nvidia\/cuopt:/g, "nvcr.io/nvidia/cuopt/cuopt:");
+        cmd =
+          "# NVIDIA NGC (nvcr.io) — authenticate once per session, then pull:\n" +
+          "docker login nvcr.io\n" +
+          "# Username: $oauthtoken\n" +
+          "# Password: <NGC API key>\n\n" +
+          "docker pull nvcr.io/nvidia/cuopt/cuopt:" +
+          tag +
+          "\n\n" +
+          "# Run the container:\n" +
+          runLine;
+      } else {
+        cmd =
+          "# Docker Hub (docker.io) — no registry login required for public pulls\n" +
+          hubPull +
+          "\n\n" +
+          "# Run the container:\n" +
+          runLine;
+      }
     } else {
       var key = data[release].cu12 && data[release].cu13 ? cuda : "default";
       cmd = data[release][key] || data[release].cu12 || data[release].cu13 || data[release].default || "";
@@ -302,9 +339,17 @@
     var cudaRow = document.getElementById("cuopt-cuda-row");
     var releaseRow = document.getElementById("cuopt-release-row");
     var releaseVisible = iface !== "cli";
-    var showCuda = releaseVisible && (method === "pip" || method === "conda" || method === "container") && hasCudaVariants(iface, method);
+    var ifaceForVariants = iface === "cli" ? "c" : iface;
+    var showCuda =
+      releaseVisible &&
+      (method === "pip" || method === "conda" || method === "container") &&
+      hasCudaVariants(ifaceForVariants, method);
     cudaRow.style.display = showCuda ? "table-row" : "none";
     releaseRow.style.display = releaseVisible ? "table-row" : "none";
+    var registryRow = document.getElementById("cuopt-registry-row");
+    if (registryRow) {
+      registryRow.style.display = method === "container" ? "table-row" : "none";
+    }
     updateOutput();
   }
 
@@ -350,13 +395,17 @@
       '<label class="cuopt-opt"><input type="radio" name="cuopt-cuda" value="cu12" checked> 12.x</label>' +
       '<label class="cuopt-opt"><input type="radio" name="cuopt-cuda" value="cu13"> 13.x</label>' +
       '</td></tr>' +
+      '<tr id="cuopt-registry-row" style="display:none;"><td class="cuopt-opt-label">Registry</td><td class="cuopt-opt-group" role="group" aria-label="Container registry">' +
+      '<label class="cuopt-opt"><input type="radio" name="cuopt-registry" value="hub" checked> Docker Hub</label>' +
+      '<label class="cuopt-opt"><input type="radio" name="cuopt-registry" value="ngc"> NVIDIA NGC</label>' +
+      '</td></tr>' +
       "</table>" +
       '<div class="cuopt-install-output">' +
       '<textarea id="cuopt-cmd-out" class="cuopt-install-cmd-out" readonly rows="6" style="display:none;"></textarea>' +
       '<div class="cuopt-install-copy-wrap"><button type="button" id="cuopt-copy-btn" class="cuopt-install-copy-btn" style="display:none;">Copy command</button></div>' +
       "</div></div>";
 
-    ["cuopt-iface", "cuopt-method", "cuopt-release", "cuopt-cuda"].forEach(
+    ["cuopt-iface", "cuopt-method", "cuopt-release", "cuopt-cuda", "cuopt-registry"].forEach(
       function (name) {
         var inputs = document.querySelectorAll('input[name="' + name + '"]');
         inputs.forEach(function (input) {
diff --git a/docs/cuopt/source/cuopt-grpc/advanced.rst b/docs/cuopt/source/cuopt-grpc/advanced.rst
new file mode 100644
index 0000000000..07e41c8972
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/advanced.rst
@@ -0,0 +1,314 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+=======================
+Advanced configuration
+=======================
+
+This page lists **configuration parameters** first, then **usage** walkthroughs (TLS, Docker, private CA). Complete :doc:`quick-start` first (install, plain TCP server, and minimal example).
+
+For RPC summaries and server behavior, see :doc:`api` and :doc:`grpc-server-architecture`. Example entry points with ``CUOPT_REMOTE_*``: :doc:`examples`. Contributor-only internals: ``cpp/docs/grpc-server-architecture.md`` in the repository.
+
+Configuration parameters
+========================
+
+``cuopt_grpc_server`` (host or explicit container command)
+------------------------------------------------------------
+
+Run ``cuopt_grpc_server --help`` for the full list. Typical flags (also passable inside ``CUOPT_GRPC_ARGS`` when using the container entrypoint):
+
+.. code-block:: text
+
+   cuopt_grpc_server [options]
+
+     -p, --port PORT              gRPC listen port (default: 5001)
+     -w, --workers NUM            Number of worker processes (default: 1)
+         --max-message-mb N       Max gRPC message size in MiB (default: 256; clamped to [4 KiB, ~2 GiB])
+         --max-message-bytes N    Max gRPC message size in bytes (exact; min 4096)
+         --chunk-timeout N        Per-chunk timeout in seconds for streaming (0=disabled, default: 60)
+         --log-to-console         Echo solver logs to server console
+     -v, --verbose                Increase verbosity (default: on)
+     -q, --quiet                  Reduce verbosity (verbose is the default)
+         --server-log PATH        Path to server operational log file (in addition to console)
+
+   TLS Options:
+         --tls                    Enable TLS encryption
+         --tls-cert PATH          Server certificate (PEM)
+         --tls-key PATH           Server private key (PEM)
+         --tls-root PATH          Root CA certificate (for client verification)
+         --require-client-cert    Require client certificate (mTLS)
+
+NVIDIA cuOpt container (gRPC via entrypoint)
+--------------------------------------------
+
+These variables apply when the container **entrypoint** builds a ``cuopt_grpc_server`` command (see *Docker: gRPC server in container* under Usage). If you pass an explicit command after the image name, this table does not apply.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 22 18 60
+
+   * - Variable
+     - Default
+     - Description
+   * - ``CUOPT_SERVER_TYPE``
+     - *(unset)*
+     - Set to ``grpc`` for entrypoint-built gRPC. Unset with no explicit command: **Python REST** server.
+   * - ``CUOPT_SERVER_PORT``
+     - ``5001``
+     - Passed as ``--port`` to ``cuopt_grpc_server``.
+   * - ``CUOPT_GPU_COUNT``
+     - *(unset)*
+     - When set, passed as ``--workers``. When unset, ``--workers`` is omitted (server default, typically 1).
+   * - ``CUOPT_GRPC_ARGS``
+     - *(empty)*
+     - Extra flags split on **whitespace** and appended (TLS, ``--max-message-mb``, ``--log-to-console``, etc.). Paths with spaces: prefer mounts without spaces or run ``cuopt_grpc_server`` manually with proper quoting.
+
+The REST server path in the same image still uses ``CUOPT_SERVER_PORT`` for HTTP in other docs; that is separate from the gRPC defaults above.
+
+Bundled remote client (Python, C API, ``cuopt_cli``)
+----------------------------------------------------
+
+Remote mode is active when **both** ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` are set. A **custom** gRPC client does not read these automatically; it must configure the channel and protos itself (see :doc:`api`).
+
+.. list-table::
+   :header-rows: 1
+   :widths: 26 14 18 42
+
+   * - Variable
+     - Required
+     - Default
+     - Description
+   * - ``CUOPT_REMOTE_HOST``
+     - For remote
+     - —
+     - Server hostname or IP
+   * - ``CUOPT_REMOTE_PORT``
+     - For remote
+     - —
+     - Server port (e.g. ``5001``)
+   * - ``CUOPT_TLS_ENABLED``
+     - No
+     - ``0``
+     - Non-zero enables TLS on the client
+   * - ``CUOPT_TLS_ROOT_CERT``
+     - If TLS
+     - —
+     - PEM path to verify the **server** certificate
+   * - ``CUOPT_TLS_CLIENT_CERT``
+     - mTLS
+     - —
+     - Client certificate PEM
+   * - ``CUOPT_TLS_CLIENT_KEY``
+     - mTLS
+     - —
+     - Client private key PEM
+   * - ``CUOPT_CHUNK_SIZE``
+     - No
+     - 16 MiB (lib)
+     - Chunk size in **bytes** for large transfers (clamped in library code)
+   * - ``CUOPT_MAX_MESSAGE_BYTES``
+     - No
+     - 256 MiB (lib)
+     - Client gRPC max message size in **bytes** (clamped in library code)
+   * - ``CUOPT_GRPC_DEBUG``
+     - No
+     - ``0``
+     - Non-zero: extra gRPC client logging
+
+Usage
+=====
+
+Start the server with TLS
+--------------------------
+
+Basic (no TLS), plain TCP, is in :doc:`quick-start`. Encrypted server:
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 \
+     --tls \
+     --tls-cert server.crt \
+     --tls-key server.key
+
+mTLS (mutual TLS):
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 \
+     --tls \
+     --tls-cert server.crt \
+     --tls-key server.key \
+     --tls-root ca.crt \
+     --require-client-cert
+
+How mTLS works
+--------------
+
+With mTLS the server verifies every client, and the client verifies the server. Trust is based on **Certificate Authorities** (CAs), not individual certificate lists:
+
+* ``--tls-root ca.crt`` tells the server which CA to trust; any client cert signed by that CA is accepted. The server does not store per-client certificates.
+* ``--require-client-cert`` makes client verification **mandatory**. Without it, the server may still allow connections without a client cert.
+* On the client, ``CUOPT_TLS_ROOT_CERT`` is the CA that signed the **server** certificate so the client can verify the server.
+
+Restricting access with a private CA
+------------------------------------
+
+To limit which clients can connect, run your own CA and issue client certs only to authorized actors.
+
+**1. Create a private CA (one-time):**
+
+.. code-block:: bash
+
+   openssl genrsa -out ca.key 4096
+   openssl req -new -x509 -key ca.key -sha256 -days 3650 \
+     -subj "/CN=cuopt-internal-ca" -out ca.crt
+
+**2. Issue a client certificate:**
+
+.. code-block:: bash
+
+   openssl genrsa -out client.key 2048
+   openssl req -new -key client.key \
+     -subj "/CN=team-member-alice" -out client.csr
+   openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key \
+     -CAcreateserial -days 365 -sha256 -out client.crt
+
+Repeat for each authorized client. Keep ``ca.key`` private; distribute ``ca.crt`` to the server and per-client ``client.crt`` + ``client.key`` pairs.
+
+**3. Issue a server certificate (same CA):**
+
+.. code-block:: bash
+
+   openssl genrsa -out server.key 2048
+   openssl req -new -key server.key \
+     -subj "/CN=server.example.com" -out server.csr
+
+   cat > server.ext <<EOF
+   subjectAltName=DNS:server.example.com,DNS:localhost,IP:127.0.0.1
+   EOF
+
+   openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
+     -CAcreateserial -days 365 -sha256 -extfile server.ext -out server.crt
+
+``server.crt`` must be signed by the CA you give to clients, and **subjectAltName** must match the hostname or IP clients use. gRPC hostname verification expects SAN; **CN alone is not sufficient**.
+
+**4. Start the server:**
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 \
+     --tls \
+     --tls-cert server.crt \
+     --tls-key server.key \
+     --tls-root ca.crt \
+     --require-client-cert
+
+**5. Configure an authorized client:**
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=server.example.com
+   export CUOPT_REMOTE_PORT=5001
+   export CUOPT_TLS_ENABLED=1
+   export CUOPT_TLS_ROOT_CERT=ca.crt
+   export CUOPT_TLS_CLIENT_CERT=client.crt
+   export CUOPT_TLS_CLIENT_KEY=client.key
+
+**Revocation:** built-in gRPC TLS does **not** implement CRL or OCSP. To revoke a client, rotate the CA, stop issuing from a compromised CA, or terminate TLS at a reverse proxy (e.g., Envoy) that supports revocation.
+
+Docker: gRPC server in container
+---------------------------------
+
+The official NVIDIA cuOpt image includes the REST server and ``cuopt_grpc_server``. The entrypoint behaves as follows:
+
+1. **Explicit command** after the image name (e.g. ``cuopt_grpc_server …``) runs as-is; env-based gRPC wiring is skipped.
+2. **`CUOPT_SERVER_TYPE=grpc`** builds a ``cuopt_grpc_server`` command from the **NVIDIA cuOpt container** table in *Configuration parameters*.
+3. **Default** — if ``CUOPT_SERVER_TYPE`` is unset and there is no explicit command, the Python **REST** server starts.
+
+.. note::
+
+   Examples use ``--gpus all``. That requires NVIDIA GPUs on the host and Docker with the `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ so devices are visible inside the container.
+
+Typical run:
+
+.. code-block:: bash
+
+   docker run --gpus all -p 5001:5001 \
+     -e CUOPT_SERVER_TYPE=grpc \
+     nvcr.io/nvidia/cuopt/cuopt:latest
+
+TLS example with a cert volume:
+
+.. code-block:: bash
+
+   docker run --gpus all -p 5001:5001 \
+     -e CUOPT_SERVER_TYPE=grpc \
+     -e CUOPT_GRPC_ARGS="--tls --tls-cert /certs/server.crt --tls-key /certs/server.key --log-to-console" \
+     -v ./certs:/certs:ro \
+     nvcr.io/nvidia/cuopt/cuopt:latest
+
+Bypass the entrypoint:
+
+.. code-block:: bash
+
+   docker run --gpus all -p 5001:5001 \
+     nvcr.io/nvidia/cuopt/cuopt:latest \
+     cuopt_grpc_server --port 5001 --workers 2
+
+Client environment (examples)
+------------------------------
+
+**Required** for remote (see *Bundled remote client* table for all variables):
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=<server-hostname>
+   export CUOPT_REMOTE_PORT=5001
+
+**TLS** (optional):
+
+.. code-block:: bash
+
+   export CUOPT_TLS_ENABLED=1
+   export CUOPT_TLS_ROOT_CERT=ca.crt
+
+For mTLS, also:
+
+.. code-block:: bash
+
+   export CUOPT_TLS_CLIENT_CERT=client.crt
+   export CUOPT_TLS_CLIENT_KEY=client.key
+
+Limitations and scope
+=====================
+
+* **Problem types** — **LP**, **MILP**, and **QP** are supported on the gRPC remote path. **Routing** (VRP, TSP, PDP) is **not** supported yet; use the :doc:`REST self-hosted server <../cuopt-server/index>` for remote routing until a future release adds routing over ``CuOptRemoteService``.
+* **Message size** — Large problems use chunking; very large models can still hit gRPC max message / timeout limits. Tune ``CUOPT_CHUNK_SIZE``, ``CUOPT_MAX_MESSAGE_BYTES``, server ``--max-message-mb``, and solver ``time_limit`` as needed.
+* **``CUOPT_GRPC_ARGS``** — Parsed on whitespace only; arguments containing spaces are awkward unless you invoke ``cuopt_grpc_server`` directly.
+* **CRL / OCSP** — Not handled by the bundled gRPC TLS stack; use a private CA rotation strategy or a TLS-terminating proxy if you need revocation workflows.
+
+Troubleshooting
+===============
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - Symptom
+     - Check
+   * - Connection refused
+     - Server running; host/port match; firewalls and Docker port mapping.
+   * - TLS handshake failure
+     - ``CUOPT_TLS_ENABLED=1``; correct CA and cert paths; SAN matches server name.
+   * - Cannot open TLS file
+     - Path exists and is readable inside the client/server environment (including container mounts).
+   * - Timeout on large problems
+     - Increase solver ``time_limit`` and client/server message limits.
+
+Further reading
+===============
+
+* :doc:`quick-start` — Plain TCP quick path.
+* :doc:`examples` — Links to Python, C, and CLI example sections (use with ``CUOPT_REMOTE_*`` on the client).
+* :doc:`grpc-server-architecture` — Process model and job behavior (operator overview).
diff --git a/docs/cuopt/source/cuopt-grpc/api.rst b/docs/cuopt/source/cuopt-grpc/api.rst
new file mode 100644
index 0000000000..3d44857b7a
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/api.rst
@@ -0,0 +1,98 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+======================
+gRPC API (reference)
+======================
+
+The **CuOptRemoteService** gRPC API is defined in Protocol Buffers under the ``cuopt.remote`` package. Source files in the repository:
+
+* ``cpp/src/grpc/cuopt_remote_service.proto`` — service and job/chunk/log RPCs
+* ``cpp/src/grpc/cuopt_remote.proto`` — LP/MIP problem, settings, and result messages
+
+Most users do **not** call these RPCs directly: the NVIDIA cuOpt **Python** API, **C API**, and **cuopt_cli** submit jobs using solver APIs plus :doc:`environment variables <advanced>`. **Custom** clients call ``CuOptRemoteService`` over gRPC using these definitions. This page summarizes the service for custom integrators and debugging.
+
+Service: ``CuOptRemoteService``
+================================
+
+Asynchronous jobs
+-----------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``SubmitJob``
+     - Submit an LP or MILP job in one message (within gRPC message size limits).
+   * - ``CheckStatus``
+     - Poll job status by ``job_id``.
+   * - ``GetResult``
+     - Fetch a completed result (unary, when the payload fits one message).
+   * - ``DeleteResult``
+     - Remove a stored result from server memory.
+   * - ``CancelJob``
+     - Cancel a queued or running job.
+   * - ``WaitForCompletion``
+     - Block until the job finishes (status only; use ``GetResult`` for the solution).
+
+Chunked upload (large problems)
+--------------------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``StartChunkedUpload``
+     - Begin a session; send problem metadata and settings (arrays follow as chunks).
+   * - ``SendArrayChunk``
+     - Upload one slice of a numeric array field.
+   * - ``FinishChunkedUpload``
+     - Finalize the upload and return ``job_id`` (same as ``SubmitJob``).
+
+Chunked download (large results)
+--------------------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``StartChunkedDownload``
+     - Begin a download session; returns scalar result fields and array descriptors.
+   * - ``GetResultChunk``
+     - Fetch one chunk of a result array.
+   * - ``FinishChunkedDownload``
+     - End the download session and release server state.
+
+Streaming and callbacks
+-----------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``StreamLogs``
+     - Server-streaming solver log lines for a job.
+   * - ``GetIncumbents``
+     - MILP incumbent solutions since a given index.
+
+Messages and constraints
+========================
+
+* **Problem types** — LP and MILP in the enum; the problem payload can include quadratic objective data for **QP**-style solves where the client API supports it. **Routing** over this gRPC service is **not** available yet; it is planned for an **upcoming** release (use REST for remote routing today).
+* **Solver settings** — Carried as ``PDLPSolverSettings`` or ``MIPSolverSettings`` inside the request or chunked header, aligned with the NVIDIA cuOpt solver options documentation.
+* **Errors** — gRPC status codes carry failures (see comments at the end of ``cuopt_remote_service.proto``).
+
+Further reading
+===============
+
+* :doc:`grpc-server-architecture` — Server process model and job lifecycle (overview); :doc:`advanced` for ``cuopt_grpc_server`` flags. Contributor details: ``cpp/docs/grpc-server-architecture.md``.
+* :doc:`advanced` — TLS, Docker, client environment variables, and limitations.
diff --git a/docs/cuopt/source/cuopt-grpc/examples.rst b/docs/cuopt/source/cuopt-grpc/examples.rst
new file mode 100644
index 0000000000..730db5f612
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/examples.rst
@@ -0,0 +1,66 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+========
+Examples
+========
+
+gRPC remote execution uses the same **Python**, **C API**, and **cuopt_cli** entry points as a local solve. After you start ``cuopt_grpc_server`` on the GPU host (:doc:`quick-start`), set the client environment and run **any** of the examples below **unchanged** — no code edits are required.
+
+On the **client** host, before running the example commands or scripts:
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=<gpu-hostname-or-ip>
+   export CUOPT_REMOTE_PORT=5001
+
+Add TLS or tuning variables from :doc:`advanced` if your deployment uses them.
+
+.. note::
+
+   Routing solve over gRPC is not supported. For solving routing problems remotely today, use the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>` and :doc:`Examples <../cuopt-server/examples/index>`.
+
+Where to find examples
+======================
+
+Python (LP / QP / MILP)
+-----------------------
+
+* :doc:`../cuopt-python/lp-qp-milp/lp-qp-milp-examples` — runnable Python samples (LP, QP, MILP). With ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` set on the client, solves go to the remote server automatically.
+
+C API (LP / QP / MILP)
+----------------------
+
+* :doc:`../cuopt-c/lp-qp-milp/lp-qp-example` — LP and QP C examples.
+* :doc:`../cuopt-c/lp-qp-milp/milp-examples` — MILP C examples.
+
+  Compile and run these programs with the same exports in the shell; ``solve_lp`` / ``solve_mip`` use gRPC when both remote variables are set (see :doc:`../cuopt-c/lp-qp-milp/lp-qp-milp-c-api` for API reference).
+
+``cuopt_cli``
+-------------
+
+* :doc:`../cuopt-cli/cli-examples` — ``cuopt_cli`` invocations. With the exports above, the CLI forwards solves to ``cuopt_grpc_server``.
+
+Minimal demos (this section)
+----------------------------
+
+Bundled with the gRPC docs source for a quick copy-paste path (also walked through in :doc:`quick-start`):
+
+* :download:`remote_lp_demo.py <examples/remote_lp_demo.py>`
+* :download:`remote_lp_demo.mps <examples/remote_lp_demo.mps>`
+
+Custom gRPC client
+------------------
+
+Integrations that do **not** use the bundled Python / C / CLI stack should speak ``CuOptRemoteService`` directly. See :doc:`api`, :doc:`grpc-server-architecture`, and ``cpp/docs/grpc-server-architecture.md`` in the repository for protos and server behavior.
+
+More samples
+============
+
+* `NVIDIA cuOpt examples on GitHub <https://github.com/NVIDIA/cuopt-examples>`_ — set the remote environment on the **client** before running notebooks or scripts.
+
+REST vs gRPC
+============
+
+* **Self-hosted HTTP/JSON** — :doc:`../cuopt-server/examples/index` targets the REST server; request shapes follow the OpenAPI workflow, not the ``CuOptRemoteService`` protos.
diff --git a/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps
new file mode 100644
index 0000000000..95d342250c
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps
@@ -0,0 +1,13 @@
+NAME   good-1
+ROWS
+ N  COST
+ L  ROW1
+ L  ROW2
+COLUMNS
+   VAR1      COST      -0.2
+   VAR1      ROW1      3              ROW2      2.7
+   VAR2      COST      0.1
+   VAR2      ROW1      4              ROW2      10.1
+RHS
+   RHS1      ROW1      5.4            ROW2      4.9
+ENDATA
diff --git a/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py
new file mode 100644
index 0000000000..4b24938c6c
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py
@@ -0,0 +1,37 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""Minimal LP demo for NVIDIA cuOpt gRPC remote execution.
+
+Set CUOPT_REMOTE_HOST and CUOPT_REMOTE_PORT on the client before running to forward
+the solve to cuopt_grpc_server; unset them to solve locally (GPU required locally).
+
+The same LP is available as MPS in ``remote_lp_demo.mps`` for ``cuopt_cli``.
+"""
+
+import numpy as np
+from cuopt import linear_programming
+
+dm = linear_programming.DataModel()
+A_values = np.array([3.0, 4.0, 2.7, 10.1], dtype=np.float64)
+A_indices = np.array([0, 1, 0, 1], dtype=np.int32)
+A_offsets = np.array([0, 2, 4], dtype=np.int32)
+dm.set_csr_constraint_matrix(A_values, A_indices, A_offsets)
+
+b = np.array([5.4, 4.9], dtype=np.float64)
+dm.set_constraint_bounds(b)
+
+c = np.array([0.2, 0.1], dtype=np.float64)
+dm.set_objective_coefficients(c)
+
+dm.set_row_types(np.array(["L", "L"]))
+
+dm.set_variable_lower_bounds(np.array([0.0, 0.0], dtype=np.float64))
+dm.set_variable_upper_bounds(np.array([2.0, np.inf], dtype=np.float64))
+
+settings = linear_programming.SolverSettings()
+solution = linear_programming.Solve(dm, settings)
+
+print("Termination:", solution.get_termination_reason())
+print("Objective:  ", solution.get_primal_objective())
+print("Primal x:   ", solution.get_primal_solution())
diff --git a/docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md b/docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md
new file mode 100644
index 0000000000..9162f8ad27
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md
@@ -0,0 +1,78 @@
+# gRPC server behavior
+
+NVIDIA cuOpt's **`cuopt_grpc_server`** uses one **main process** (gRPC front end, job tracking, background threads) and **worker processes** that run GPU solves. That layout gives isolation between jobs, optional parallelism when you set multiple workers, and streaming for large problems and logs.
+
+Implementation details (IPC layout, C++ source map, chunked transfer internals) live in the contributor reference: **`cpp/docs/grpc-server-architecture.md`** in the NVIDIA cuOpt repository.
+
+## Process model
+
+```text
+┌──────────────────────────────────────────────────────────────────────┐
+│                        Main Server Process                           │
+│                                                                      │
+│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────────────┐  │
+│  │  gRPC       │  │  Job         │  │  Background Threads         │  │
+│  │  Service    │  │  Tracker     │  │  - Result retrieval         │  │
+│  │  Handler    │  │  (job status,│  │  - Incumbent retrieval      │  │
+│  │             │  │   results)   │  │  - Worker monitor           │  │
+│  └─────────────┘  └──────────────┘  └─────────────────────────────┘  │
+│         │                                        ▲                   │
+│         │ shared memory                          │ pipes             │
+│         ▼                                        │                   │
+│  ┌─────────────────────────────────────────────────────────────────┐ │
+│  │                       Shared Memory Queues                      │ │
+│  │                                                                 │ │
+│  │   ┌─────────────────┐        ┌─────────────────────┐            │ │
+│  │   │  Job Queue      │        │  Result Queue       │            │ │
+│  │   │  (MAX_JOBS=100) │        │  (MAX_RESULTS=100)  │            │ │
+│  │   └─────────────────┘        └─────────────────────┘            │ │
+│  └─────────────────────────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────────────────┘
+               │                                        ▲
+               │ fork()                                 │
+               ▼                                        │
+     ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
+     │  Worker 0       │  │  Worker 1       │  │  Worker N       │
+     │  ┌───────────┐  │  │  ┌───────────┐  │  │  ┌───────────┐  │
+     │  │ GPU Solve │  │  │  │ GPU Solve │  │  │  │ GPU Solve │  │
+     │  └───────────┘  │  │  └───────────┘  │  │  └───────────┘  │
+     │  (separate proc)│  │  (separate proc)│  │  (separate proc)│
+     └─────────────────┘  └─────────────────┘  └─────────────────┘
+```
+
+## Job lifecycle (summary)
+
+**Submit** → the server assigns a job id and queues work. **Process** → a worker pulls the problem, solves on the GPU, and streams the result back. **Retrieve** → the client uses status and result RPCs (including chunked download when needed). See [gRPC API (reference)](api.rst) for RPC names.
+
+## Job states
+
+```text
+┌─────────┐  submit   ┌───────────┐  claim   ┌────────────┐
+│ QUEUED  │──────────►│ PROCESSING│─────────►│ COMPLETED  │
+└─────────┘           └───────────┘          └────────────┘
+     │                      │
+     │ cancel               │ error
+     ▼                      ▼
+┌───────────┐          ┌─────────┐
+│ CANCELLED │          │ FAILED  │
+└───────────┘          └─────────┘
+```
+
+## Logs, capacity, and workers
+
+| Topic | Detail |
+|-------|--------|
+| Log files | Per-job solver logs under `/tmp/cuopt_logs/job_<job_id>.log` (used by log streaming). |
+| Default caps | Up to **100** queued jobs and **100** stored results (server compile-time limits). |
+| Workers | Recommended: **1 worker process per GPU**. Higher values are possible depending on the problems being solved but there is no specific guidance at this time. |
+
+## Fault tolerance and cancellation
+
+- If a **worker process crashes**, jobs it was running are marked **FAILED**; the server can spawn replacement workers (see contributor doc for details).
+- **`CancelJob`** cancels **queued** jobs immediately (the worker skips them). If the solver has already started, the **worker process is killed** and the job is marked **CANCELLED**; a replacement worker is spawned automatically.
+
+## Further reading
+
+- [Advanced configuration](advanced.rst) — `cuopt_grpc_server` **command-line flags**, TLS, Docker (`CUOPT_SERVER_TYPE`, `CUOPT_GRPC_ARGS`), and **client** environment variables (authoritative for operators).
+- [gRPC API (reference)](api.rst) — `CuOptRemoteService` RPC overview.
+- **Contributor reference** — `cpp/docs/grpc-server-architecture.md` in the repository (IPC, source files, streaming, threading).
diff --git a/docs/cuopt/source/cuopt-grpc/index.rst b/docs/cuopt/source/cuopt-grpc/index.rst
new file mode 100644
index 0000000000..738180f877
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/index.rst
@@ -0,0 +1,30 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+==========================
+gRPC remote execution
+==========================
+
+**NVIDIA cuOpt gRPC remote execution** runs optimization solves on a remote GPU host. Clients can be the **Python** API, **C API**, **`cuopt_cli`**, or a **custom** program that speaks ``CuOptRemoteService`` over gRPC. For Python, the C API, and ``cuopt_cli``, set ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` to forward solves to ``cuopt_grpc_server``.
+
+.. note::
+
+   **Problem types (gRPC remote):** LP, MILP, and QP are supported today. **Routing** (VRP, TSP, PDP, and related APIs) over gRPC remote execution is **not** available yet; support is planned for an **upcoming** release. For routing against a remote service today, use the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>`.
+
+This is **not** the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>` (FastAPI). REST is for arbitrary HTTP clients; gRPC is for the bundled remote client in NVIDIA cuOpt's native APIs.
+
+Start with :doc:`quick-start` (install selector, how remote execution works, Docker, and a minimal LP example). Use :doc:`advanced` for TLS, tuning, limitations, and troubleshooting; :doc:`examples` for additional patterns.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: In this section
+   :name: cuopt-grpc-contents
+
+   quick-start.rst
+   advanced.rst
+   examples.rst
+   api.rst
+   grpc-server-architecture.md
+
+See :doc:`../system-requirements` for GPU, CUDA, and OS requirements.
diff --git a/docs/cuopt/source/cuopt-grpc/quick-start.rst b/docs/cuopt/source/cuopt-grpc/quick-start.rst
new file mode 100644
index 0000000000..b3c0ca2cc1
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/quick-start.rst
@@ -0,0 +1,158 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+===========
+Quick start
+===========
+
+**NVIDIA cuOpt gRPC remote execution** runs LP, MILP, and QP solves on a **GPU host** while your **Python** code, **C API** program, **`cuopt_cli`**, or a **custom** client runs elsewhere. When you set ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT``, the bundled **Python**, **C API**, and **cuopt_cli** clients forward ``solve_lp`` / ``solve_mip`` to ``cuopt_grpc_server`` with **no code changes**. **Custom** clients call ``CuOptRemoteService`` directly (see :doc:`api`).
+
+.. note::
+
+   **Problem types (gRPC remote):** **LP**, **MILP**, and **QP** are supported today. **Routing** (VRP, TSP, PDP) over this path is **not** available;  For remote routing, use the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>`. This guide is **not** the REST server—see :doc:`../cuopt-server/index` for HTTP/JSON.
+
+How remote execution works
+==========================
+
+1. **GPU host** — Run ``cuopt_grpc_server`` (bare metal or in the official container) so it listens on a TCP port (default **5001**).
+2. **Client** — Install the NVIDIA cuOpt client libraries on the machine where you invoke the solver. Set ``CUOPT_REMOTE_HOST`` to that GPU host’s address and ``CUOPT_REMOTE_PORT`` to the listen port.
+3. **Solve** — Call the same APIs you would for a local solve. The client library opens a gRPC channel, streams the problem, and retrieves the result. Unset the two variables to solve **locally** again (local mode still needs a GPU on that machine where applicable).
+
+Install NVIDIA cuOpt
+====================
+
+Use the selector below on the **GPU server** and on **clients** that need Python, the C API, or ``cuopt_cli``. It is pre-set to **C (libcuopt)** because that bundle ships ``cuopt_grpc_server``, ``cuopt_cli``, and libraries together; switch to **Python** if you only need Python packages on a lightweight client.
+
+.. install-selector::
+   :default-iface: c
+
+Verify the server binary after install:
+
+.. code-block:: bash
+
+   cuopt_grpc_server --help
+
+For the same install selector with **Container** / registry choices (Docker Hub or NGC), see :doc:`../install`.
+
+Run the gRPC server (GPU host)
+==============================
+
+**Bare metal** — after activating the same environment you used to install NVIDIA cuOpt:
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 --workers 1
+
+Leave the process running. Default port **5001**; change ``--port`` if needed and expose the same port on the client side.
+
+**Docker** — requires `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ (or equivalent) on the host. Pull an image tag from :doc:`../install` or the **Container** row in the selector above; substitute ``<CUOPT_IMAGE>`` below.
+
+Entrypoint mode (recommended when you are not passing an explicit command):
+
+.. code-block:: bash
+
+   docker run --gpus all -it --rm -p 5001:5001 \
+     -e CUOPT_SERVER_TYPE=grpc \
+     <CUOPT_IMAGE>
+
+Or invoke the binary explicitly:
+
+.. code-block:: bash
+
+   docker run --gpus all -it --rm -p 5001:5001 \
+     <CUOPT_IMAGE> \
+     cuopt_grpc_server --port 5001 --workers 1
+
+.. note::
+
+   The container image defaults to the Python **REST** server when ``CUOPT_SERVER_TYPE`` is unset and you do not override the command; setting ``CUOPT_SERVER_TYPE=grpc`` selects ``cuopt_grpc_server``. Extra environment variables (``CUOPT_SERVER_PORT``, ``CUOPT_GPU_COUNT``, ``CUOPT_GRPC_ARGS``) and TLS are documented in :doc:`Advanced configuration <advanced>`.
+
+Point the client at the server
+==============================
+
+On the machine where you run Python, the C API, or ``cuopt_cli`` (use ``127.0.0.1`` if the server is on the same host):
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=<gpu-hostname-or-ip>
+   export CUOPT_REMOTE_PORT=5001
+
+Optional TLS and tuning variables are in :doc:`advanced`.
+
+Minimal Python example (LP)
+============================
+
+The script is the same for **local** or **remote** solves: with the exports above, the client library forwards to ``cuopt_grpc_server``; without them, the solve runs locally (where a GPU is available).
+Please make sure the server is running before running the client.
+
+:download:`remote_lp_demo.py <examples/remote_lp_demo.py>`
+
+.. literalinclude:: examples/remote_lp_demo.py
+   :language: python
+   :linenos:
+
+Run the script from your NVIDIA cuOpt Python environment. From a **repository checkout** (repo root):
+
+.. code-block:: bash
+
+   python docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py
+
+Or, after :download:`downloading <examples/remote_lp_demo.py>` the file into your current directory:
+
+.. code-block:: bash
+
+   python remote_lp_demo.py
+
+You should see an optimal termination. To solve **locally**, unset the remote variables and rerun with the **same** path you used above:
+
+.. code-block:: bash
+
+   unset CUOPT_REMOTE_HOST CUOPT_REMOTE_PORT
+   python remote_lp_demo.py
+
+Minimal ``cuopt_cli`` example (LP)
+==================================
+
+The same **LP** is available as MPS. With ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` set as above, ``cuopt_cli`` forwards the solve to the remote server; unset them for a **local** run (GPU on that machine).
+Please make sure the server is running before running the client.
+
+:download:`remote_lp_demo.mps <examples/remote_lp_demo.mps>`
+
+.. literalinclude:: examples/remote_lp_demo.mps
+   :language: text
+
+From a **repository checkout** (repo root):
+
+.. code-block:: bash
+
+   cuopt_cli docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps
+
+Or, after :download:`downloading <examples/remote_lp_demo.mps>` the MPS into your current directory:
+
+.. code-block:: bash
+
+   cuopt_cli remote_lp_demo.mps
+
+To solve **locally** with the same file:
+
+.. code-block:: bash
+
+   unset CUOPT_REMOTE_HOST CUOPT_REMOTE_PORT
+   cuopt_cli remote_lp_demo.mps
+
+More options (time limits, relaxation): :doc:`../cuopt-cli/quick-start` and :doc:`examples`.
+
+**C API** — With the same environment variables set, call ``solve_lp`` / ``solve_mip`` as in :doc:`../cuopt-c/lp-qp-milp/lp-qp-milp-c-api`.
+
+More patterns (MPS variants, custom gRPC): :doc:`examples`.
+
+Next steps
+==========
+
+* :doc:`../install` — Top-level install selector (all interfaces), including **Container** pulls.
+* :doc:`advanced` — TLS / mTLS, Docker environment reference, tuning, limitations, troubleshooting.
+* :doc:`examples` — Additional client examples and links to LP/MILP sample collections.
+* :doc:`api` and :doc:`grpc-server-architecture` — RPC summary and server behavior overview.
+
+See :doc:`../system-requirements` for GPU, CUDA, and OS requirements.
diff --git a/docs/cuopt/source/cuopt-server/index.rst b/docs/cuopt/source/cuopt-server/index.rst
index 36ea1ad8c3..0d9c7a277f 100644
--- a/docs/cuopt/source/cuopt-server/index.rst
+++ b/docs/cuopt/source/cuopt-server/index.rst
@@ -1,14 +1,15 @@
 Server
 ======
 
-NVIDIA cuOpt server is a REST API server that is built for the purpose of providing language agnostic access to the cuOpt optimization engine. Users can build their own clients in any language that supports HTTP requests or use cuopt-sh-client, a lightweight Python client, to communicate with the server.
+The **NVIDIA cuOpt self-hosted server** is a **REST** (HTTP/JSON) service for integrations that speak HTTP. Use :doc:`quick-start` for deployment, :doc:`server-api/index` for the API, and :doc:`client-api/index` for clients (including cuopt-sh-client).
+
+For **gRPC remote execution** (Python, C API, ``cuopt_cli``, or custom clients to ``cuopt_grpc_server``), see :doc:`../cuopt-grpc/index` — it uses a different protocol and is not part of the HTTP REST surface.
 
 .. image:: images/cuOpt-self-hosted.png
   :width: 500
   :align: center
 
-
-Please refer to following links for more information on API and examples:
+Please refer to the following sections for REST deployment, API reference, and examples.
 
 .. toctree::
    :caption: Quickstart
diff --git a/docs/cuopt/source/faq.rst b/docs/cuopt/source/faq.rst
index 2770e1b507..6fc218cb4e 100644
--- a/docs/cuopt/source/faq.rst
+++ b/docs/cuopt/source/faq.rst
@@ -156,6 +156,21 @@ General FAQ
 
         while openssl x509 -noout -text; do :; done < test.pem.txt
 
+gRPC remote execution (``cuopt_grpc_server``)
+-----------------------------------------------
+
+.. dropdown:: Where are log files for the gRPC server / StreamLogs?
+
+   Workers write per-job solver logs under ``/tmp/cuopt_logs/job_<job_id>.log``. The ``StreamLogs`` RPC tails that file. Operational limits and behavior are summarized in :doc:`gRPC server behavior <cuopt-grpc/grpc-server-architecture>`.
+
+.. dropdown:: What happens if a ``cuopt_grpc_server`` worker crashes?
+
+   Jobs that worker was running are marked **FAILED**. The server monitor can detect the crash and spawn a replacement worker; other workers keep running. For more detail, see :doc:`gRPC server behavior <cuopt-grpc/grpc-server-architecture>` and the contributor reference ``cpp/docs/grpc-server-architecture.md`` in the repository.
+
+.. dropdown:: Does ``CancelJob`` stop a solve immediately?
+
+   Cancellation is honored **before** the solver starts. If the solve has already begun, it **runs to completion**; there is no mid-solve cancellation path. See :doc:`gRPC server behavior <cuopt-grpc/grpc-server-architecture>`.
+
 Routing FAQ
 ------------------------------
 
diff --git a/docs/cuopt/source/index.rst b/docs/cuopt/source/index.rst
index e310c974ce..6f8ae3ba2c 100644
--- a/docs/cuopt/source/index.rst
+++ b/docs/cuopt/source/index.rst
@@ -42,6 +42,16 @@ Python (cuopt)
 
    Python Overview <cuopt-python/index.rst>
 
+====================================
+gRPC remote execution
+====================================
+.. toctree::
+   :maxdepth: 2
+   :caption: gRPC remote execution
+   :name: gRPC remote execution
+
+   gRPC overview <cuopt-grpc/index.rst>
+
 ===============================
 Server (cuopt-server)
 ===============================
diff --git a/docs/cuopt/source/install.rst b/docs/cuopt/source/install.rst
index 0b16bf606c..404d7361f8 100644
--- a/docs/cuopt/source/install.rst
+++ b/docs/cuopt/source/install.rst
@@ -16,6 +16,7 @@ If the selector does not load or you prefer step-by-step guides, use the quick-s
 
 * **Python (cuopt)** — :doc:`cuopt-python/quick-start`
 * **C (libcuopt)** — :doc:`cuopt-c/quick-start` (includes ``cuopt_cli``)
+* **gRPC remote execution** — :doc:`cuopt-grpc/quick-start` (install, remote execution, Docker, minimal example) and :doc:`cuopt-grpc/advanced` (TLS and tuning; not the HTTP server)
 * **Server (cuopt-server)** — :doc:`cuopt-server/quick-start`
 * **CLI (cuopt_cli)** — Install via the C API; see :doc:`cuopt-cli/quick-start`
 
diff --git a/docs/cuopt/source/introduction.rst b/docs/cuopt/source/introduction.rst
index 2d39a26913..bea1a35159 100644
--- a/docs/cuopt/source/introduction.rst
+++ b/docs/cuopt/source/introduction.rst
@@ -119,6 +119,8 @@ cuOpt supports the following APIs:
 - Python support
    - :doc:`Routing (TSP, VRP, and PDP) - Python <cuopt-python/quick-start>`
    - :doc:`Linear Programming (LP) / Quadratic Programming (QP) and Mixed Integer Linear Programming (MILP) - Python <cuopt-python/quick-start>`
+- gRPC remote execution
+   - :doc:`Linear Programming (LP) / Quadratic Programming (QP) and Mixed Integer Linear Programming (MILP) - gRPC remote <cuopt-grpc/quick-start>`
 - Server support
    - :doc:`Linear Programming (LP) - Server <cuopt-server/quick-start>`
    - :doc:`Mixed Integer Linear Programming (MILP) - Server <cuopt-server/quick-start>`
diff --git a/python/cuopt/cuopt/linear_programming/data_model/data_model.py b/python/cuopt/cuopt/linear_programming/data_model/data_model.py
index 39da5d6c47..648809eac1 100644
--- a/python/cuopt/cuopt/linear_programming/data_model/data_model.py
+++ b/python/cuopt/cuopt/linear_programming/data_model/data_model.py
@@ -1,4 +1,4 @@
-# SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
 import os
@@ -127,7 +127,7 @@ class DataModel(data_model_wrapper.DataModel):
     >>>
     >>> # Method 1: directly set bounds
     >>> # Set lower bounds to -infinity and upper bounds to b
-    >>> constraint_lower_bounds = np.array([np.NINF, np.NINF],
+    >>> constraint_lower_bounds = np.array([-np.inf, -np.inf],
     >>>                                       dtype=np.float64)
     >>> constraint_upper_bounds = np.array(b, dtype=np.float64)
     >>> data_model.set_constraint_lower_bounds(constraint_lower_bounds)
@@ -136,7 +136,7 @@ class DataModel(data_model_wrapper.DataModel):
     >>>
     >>> # Set variable lower and upper bounds
     >>> variable_lower_bounds = np.array([0.0, 0.0], dtype=np.float64)
-    >>> variable_upper_bounds = np.array([2.0, np.PINF], dtype=np.float64)
+    >>> variable_upper_bounds = np.array([2.0, np.inf], dtype=np.float64)
     >>> data_model.set_variable_lower_bounds(variable_lower_bounds)
     >>> data_model.set_variable_upper_bounds(variable_upper_bounds)
     """