Skip to content
This repository was archived by the owner on Jul 3, 2026. It is now read-only.

Wire AiGateway telemetry to real changes#71

Merged
Pigbibi merged 1 commit into
mainfrom
codex/aigateway-real-telemetry
Jul 3, 2026
Merged

Wire AiGateway telemetry to real changes#71
Pigbibi merged 1 commit into
mainfrom
codex/aigateway-real-telemetry

Conversation

@Pigbibi

@Pigbibi Pigbibi commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • record async job runtime health metrics so the dashboard reflects real gateway work
  • register monthly remediation PRs in the AiGateway closed-loop change feed
  • preserve source repo, PR/issue numbers, and PR URL in change records
  • link recent dashboard changes to their PR when available

Why

The dashboard was reachable, but after deployment it had no records because no producer was writing autonomous change events, and async job execution did not report runtime health. This connects the existing monthly remediation workflow to the dashboard's feedback endpoints without blocking the main remediation path if telemetry registration fails.

Validation

  • python3 -m ruff check .
  • python3 -m pytest tests -q
  • npx -y node@22 --experimental-default-type=module --test cloudflare/codex-audit-proxy/tests/index.test.mjs
  • npx -y node@22 --experimental-default-type=module --test cloudflare/ai-gateway-dash/tests/index.test.mjs
  • git diff --check

Co-Authored-By: Codex <noreply@openai.com>
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

🤖 Codex PR Review

⚠️ Review skipped: The Codex review could not be completed.

Codex service request failed: 401 {"status": "error", "error": "OIDC repository is not allowed"}

Please ensure a human reviewer checks this PR before merging.

@Pigbibi Pigbibi merged commit d706519 into main Jul 3, 2026
4 checks passed
@Pigbibi Pigbibi deleted the codex/aigateway-real-telemetry branch July 3, 2026 21:11

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eee6dedc02

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1015 to +1016
except BridgeError as exc:
return str(exc)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Catch telemetry transport failures

This is intended to be best-effort, but request_codex_service_json() can raise non-BridgeError exceptions such as urllib.error.URLError/timeouts from urlopen() or JSONDecodeError for a bad gateway response. When CODEX_AUDIT_SERVICE_URL is configured and the telemetry endpoint is unreachable or returns invalid JSON, that exception escapes after the PR has been created and before the issue comment is posted, turning an otherwise successful remediation into a failed run. Catch the transport/parse failures here and return them as warnings too.

Useful? React with 👍 / 👎.

Comment on lines +515 to +518
get_health_monitor().record(
"/v1/ai/execute/jobs/run",
time.time() - started,
job["status"] == "succeeded",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep async job runtime out of endpoint latency

When a normal async Codex job runs longer than the health monitor's latency thresholds, this records the entire background job duration as endpoint latency, so /v1/ai/health can mark the service degraded/unhealthy after successful long-running jobs even though the HTTP endpoint is healthy. The async execute contract permits long jobs, so this should be tracked as a separate job-duration metric or excluded from the health latency status.

Useful? React with 👍 / 👎.

before_metrics={str(k): float(v) for k, v in payload.get("before_metrics", {}).items()},
source_repo=str(payload.get("source_repository", "")),
source_repo=source_repo,
external_url=str(payload.get("external_url", "")),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unsafe external change URLs

When an allowed workflow registers a change with an external_url using a non-HTTP scheme, this value is persisted and later rendered directly as an anchor href in the dashboard. Since the field is intended to link to a PR, validate it here as https:// (or at least HTTP(S)) before storing it so the internal dashboard does not become a clickable unsafe-scheme/script sink.

Useful? React with 👍 / 👎.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant