Persist AiGateway quota telemetry#67
Conversation
Co-Authored-By: Codex <noreply@openai.com>
🤖 Codex PR ReviewPlease ensure a human reviewer checks this PR before merging. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64c3918cfc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return | ||
| path.parent.mkdir(mode=0o700, parents=True, exist_ok=True) | ||
| payload = json.dumps( | ||
| {"records": {repo: record.to_dict() for repo, record in self._records.items()}}, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if not isinstance(records, dict): | ||
| return | ||
| self._records = { | ||
| repo: QuotaRecord.from_dict(item) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| reasoning_effort=reasoning_effort, | ||
| timeout=req.timeout_seconds, | ||
| ) | ||
| get_health_monitor().record("/v1/ai/execute", time.time() - started, result.success, result.error if not result.success else "") |
There was a problem hiding this comment.
Record sync Codex exceptions in health metrics
Because the new health sample is written only after adapter.execute() returns, exceptions raised before a CodexResult is produced — for example the RuntimeError path when the codex CLI is missing on the service host — skip this call and are handled by the outer generic 500 path. In that dependency-failure scenario /v1/ai/execute failures do not appear in the health dashboard at all, so record the failed sample in an exception/finally path as well.
Useful? React with 👍 / 👎.
Summary
Why
The dashboard was operational but showed no quota/health activity after deploy because quota data was in-memory only and async job submissions were not represented in health endpoint metrics.
Validation
python3 -m ruff check .python3 -m pytest tests -qbash -n scripts/deploy_codex_audit_service.shgit diff --check