Upgrade nginx configuration to reduce 104 error by zhogu · Pull Request #176 · microsoft/ltp-platform

zhogu · 2026-04-01T10:02:44Z

This pull request introduces several improvements to the Nginx deployment and adds tooling for testing and mocking inference servers. The main themes are: enhancing Nginx's robustness and scalability under high concurrency, and providing tools for local testing and benchmarking of inference endpoints.

Nginx configuration and deployment improvements:

Increased worker_connections from 1024 to 65535 in nginx.conf.template to support more simultaneous connections, improving scalability.
Added backlog=4096 to both HTTP and HTTPS listen directives and set the system's net.core.somaxconn to 4096 in run.sh.template to match, reducing connection drops under heavy load. [1] [2] [3]
Introduced a model_proxy_upstream block with keepalive settings for efficient connection reuse and automatic stale-connection retry, mitigating connection resets under high concurrency.
Updated proxy settings for /model-proxy/ and /job-server/ locations: set HTTP/1.1, cleared Connection header, and enabled retry on connection-level errors for POST endpoints, increasing reliability. [1] [2]
Updated the Kubernetes deployment to grant the container NET_ADMIN capability, allowing it to set the listen backlog at runtime.

Tooling for local testing and benchmarking:

Added tools/mock_inference_server.py, a FastAPI-based mock inference server that simulates OpenAI-compatible chat completions with configurable delays and token counts for local/dev testing.
Added tools/test_inference.py, a concurrent benchmarking script to stress-test inference endpoints, reporting per-model and overall statistics, and logging failed requests.

Copilot

Pull request overview

This PR hardens the Nginx deployment to reduce connection resets under high concurrency (e.g., 104 errors) and introduces local tooling to mock and benchmark inference endpoints.

Changes:

Increased Nginx connection capacity and listen backlog; added upstream keepalive/retry tuning for model-proxy and job-server routes.
Updated Kubernetes deployment and startup script to attempt raising somaxconn.
Added a mock OpenAI-compatible inference server and an async benchmarking script.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tools/test_inference.py	Adds an aiohttp-based concurrent benchmarking tool for inference endpoints.
tools/mock_inference_server.py	Adds a FastAPI mock server for OpenAI-style chat completions (streaming + non-streaming).
src/pylon/deploy/pylon.yaml.template	Grants additional Linux capabilities to support runtime networking/sysctl tuning.
src/pylon/deploy/pylon-config/run.sh.template	Attempts to raise `net.core.somaxconn` to match configured Nginx backlog.
src/pylon/deploy/pylon-config/nginx.conf.template	Increases worker connection limits, adds backlog tuning, and introduces a keepalive upstream for model-proxy.
src/pylon/deploy/pylon-config/location.conf.template	Updates proxy settings to use upstream keepalive and retries on connection-level errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T10:10:15Z

tools/test_inference.py

+    parser.add_argument(
+        "--models",
+        type=lambda s: s.split(","),
+        default="gpt-3,gpt-4,gpt-5,gpt-6,gpt-7,gpt-8,gpt-9",


Argparse does not apply type= conversion to a string default, so args.models will be a single string by default and for m in models will iterate characters (and the gather will run per-character). Make the default a list (e.g., default=[...]) or set default=None and split after parsing when it’s a string.

Suggested change

default="gpt-3,gpt-4,gpt-5,gpt-6,gpt-7,gpt-8,gpt-9",

default=["gpt-3", "gpt-4", "gpt-5", "gpt-6", "gpt-7", "gpt-8", "gpt-9"],

this suggestion is good

Copilot · 2026-04-01T10:10:15Z

src/pylon/deploy/pylon-config/location.conf.template

+  proxy_pass http://model_proxy_upstream/$1$is_args$args;
+  proxy_http_version 1.1;
+  proxy_set_header Connection "";
  proxy_connect_timeout 60m;
  proxy_read_timeout 60m;
  proxy_send_timeout 60m;


This hard-codes http://model_proxy_upstream even when MODEL_PROXY_URI is https://... (the upstream definition strips both schemes). That will break TLS-to-upstream and can cause backend connection failures or unintended plaintext traffic. Preserve the upstream scheme (e.g., select http vs https via templating / map, and add the required proxy_ssl_* directives when using HTTPS).

Suggested change

proxy_pass http://model_proxy_upstream/$1$is_args$args;

proxy_http_version 1.1;

proxy_set_header Connection "";

proxy_connect_timeout 60m;

proxy_read_timeout 60m;

proxy_send_timeout 60m;

proxy_pass {{MODEL_PROXY_UPSTREAM_SCHEME}}://model_proxy_upstream/$1$is_args$args;

proxy_http_version 1.1;

proxy_set_header Connection "";

proxy_connect_timeout 60m;

proxy_read_timeout 60m;

proxy_send_timeout 60m;

proxy_ssl_server_name on;

proxy_ssl_name model_proxy_upstream;

Copilot · 2026-04-01T10:10:16Z

src/pylon/deploy/pylon-config/nginx.conf.template

+worker_processes 1;

 events {
-  worker_connections  1024;
+  worker_connections  65535;
 }


Raising worker_connections to 65535 likely won’t be effective unless the process RLIMIT_NOFILE (and/or worker_rlimit_nofile) is also increased to cover worker_processes * worker_connections plus overhead. Without that, Nginx will warn and cap usable connections. Consider setting worker_rlimit_nofile (and aligning the container/pod ulimit) alongside this change.

Copilot · 2026-04-01T10:10:16Z

src/pylon/deploy/pylon-config/run.sh.template

+# Requires NET_ADMIN capability; if it fails, fall back to OS default silently.
+sysctl -w net.core.somaxconn=4096 2>/dev/null || true


The comment states this requires NET_ADMIN, but changing net.core.somaxconn commonly requires sysctl permissions that aren’t granted by NET_ADMIN (and in Kubernetes is typically handled via pod-level securityContext.sysctls). Because failures are fully silenced, operators may believe backlog tuning is active when it isn’t. Consider (a) configuring this via pod securityContext.sysctls instead of container capabilities, and/or (b) logging a one-line warning when the sysctl write fails so it’s observable.

Suggested change

# Requires NET_ADMIN capability; if it fails, fall back to OS default silently.

sysctl -w net.core.somaxconn=4096 2>/dev/null || true

# Note: Changing net.core.somaxconn may require sysctl permissions (e.g. pod securityContext.sysctls in Kubernetes),

# not just NET_ADMIN. If this fails, log a warning and continue with the OS default.

sysctl -w net.core.somaxconn=4096 >/dev/null 2>&1 || echo "Warning: Failed to set net.core.somaxconn=4096; ensure pod securityContext.sysctls is configured if backlog tuning is required." >&2

Copilot · 2026-04-01T10:10:16Z

tools/mock_inference_server.py

+        data = {
+            "id": f"chatcmpl-{random_text(12)}",
+            "object": "chat.completion.chunk",
+            "model": model,
+            "choices": [


For OpenAI-style streaming responses, clients typically expect the id to remain stable across all chunks of a single completion. Generating a new id per chunk can break client-side correlation/assembling logic. Prefer generating one id per request and reusing it in all chunks (and optionally include consistent created metadata as well).

Copilot · 2026-04-01T10:10:16Z

tools/mock_inference_server.py

+import string
+import time
+import asyncio
+from typing import List


Unused import (List). Removing it avoids lint noise and keeps dependencies tidy.

Suggested change

from typing import List

Copilot · 2026-04-01T10:10:17Z

tools/test_inference.py

+async def run_for_model(model, args, file, headers):
+    connector = aiohttp.TCPConnector(
+        limit=args.concurrency,
+        force_close=True


force_close=True disables connection reuse, which adds TCP/TLS handshake overhead and can skew results—especially relevant since this PR is tuning Nginx/upstream keepalive behavior. Consider defaulting to keepalive-enabled connections (force_close=False) and, if needed, expose a CLI flag to force-close for specific experiments.

Suggested change

force_close=True

force_close=getattr(args, "force_close", False),

hippogr · 2026-04-02T05:48:19Z

tools/mock_inference_server.py

+
+app = FastAPI()
+
+# ====== 全局配置 ======


let's remove the comments in Chinese

and other places

hippogr · 2026-04-02T05:49:37Z

tools/test_inference.py

+    parser.add_argument(
+        "--models",
+        type=lambda s: s.split(","),
+        default="gpt-3,gpt-4,gpt-5,gpt-6,gpt-7,gpt-8,gpt-9",


this suggestion is good

update nginx

9fb60cf

zhogu requested review from abuccts, Copilot and hippogr April 1, 2026 10:02

zhogu had a problem deploying to auto-test April 1, 2026 10:02 — with GitHub Actions Failure

Copilot AI reviewed Apr 1, 2026

View reviewed changes

hippogr reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade nginx configuration to reduce 104 error#176

Upgrade nginx configuration to reduce 104 error#176
zhogu wants to merge 1 commit intodevfrom
zhogu/fix-nginx

zhogu commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

hippogr Apr 2, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

hippogr Apr 2, 2026

Uh oh!

hippogr Apr 2, 2026

Uh oh!

hippogr Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	default="gpt-3,gpt-4,gpt-5,gpt-6,gpt-7,gpt-8,gpt-9",
	default=["gpt-3", "gpt-4", "gpt-5", "gpt-6", "gpt-7", "gpt-8", "gpt-9"],

		# Requires NET_ADMIN capability; if it fails, fall back to OS default silently.
		sysctl -w net.core.somaxconn=4096 2>/dev/null \|\| true

-# Requires NET_ADMIN capability; if it fails, fall back to OS default silently.
-sysctl -w net.core.somaxconn=4096 2>/dev/null || true
+# Note: Changing net.core.somaxconn may require sysctl permissions (e.g. pod securityContext.sysctls in Kubernetes),
+# not just NET_ADMIN. If this fails, log a warning and continue with the OS default.
+sysctl -w net.core.somaxconn=4096 >/dev/null 2>&1 || echo "Warning: Failed to set net.core.somaxconn=4096; ensure pod securityContext.sysctls is configured if backlog tuning is required." >&2

	force_close=True
	force_close=getattr(args, "force_close", False),

Conversation

zhogu commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

hippogr Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

hippogr Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

hippogr Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

hippogr Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants