feat(checkpoints): FGAC checkpoint routes, enforce via parent task pe…#260
Merged
Merged
Conversation
rpatel-scale
approved these changes
Jun 2, 2026
rpatel-scale
approved these changes
Jun 2, 2026
This was referenced Jun 2, 2026
asherfink
added a commit
that referenced
this pull request
Jun 2, 2026
Enforce Fine-Grained Access Control on the agent_task_tracker routes by
delegating to the parent task. Trackers are not a SpiceDB resource (no
schema, no register_resource, no dual-write); authorization inherits from
the owning task, identical to the state/message/checkpoint paradigm:
| Operation | Route | Task permission |
|----------------|---------------------------|-----------------|
| Get tracker | GET /tracker/{tracker_id} | view (read op) |
| List trackers | GET /tracker | view (read op) |
| Update tracker | PUT /tracker/{tracker_id} | execute op |
All three routes had zero authorization before this change.
- GET /tracker/{tracker_id}: DAuthorizedId resolves tracker -> task_id and
checks task.read. A denied check OR a missing tracker collapses to 404
(not 403) so callers cannot probe cross-tenant existence by comparing
response codes. Behavior change: a missing tracker now returns 404 (was
400), because the parent-task resolution runs before the handler body and
raises ItemDoesNotExist.
- GET /tracker: DAuthorizedResourceIds(task) yields the set of tasks the
caller may view; trackers under unauthorized tasks are silently dropped
(never 403), and an explicit ?task_id= the caller cannot view yields an
empty list (never 404). agent_id remains a plain filter.
- PUT /tracker/{tracker_id}: DAuthorizedId resolves tracker -> task_id and
checks task.execute, mirroring the message/checkpoint write routes, with
the same 404 collapse on denial. Closes the cross-tenant write gap where
the read routes were locked down but the mutating route was not.
No SpiceDB type, no dual-write, no agentex-auth change, and no deploy gate:
read already maps to task.view and is foundational across all SGP envs.
Rollout note: PUT enforcement lands behind the same flag as the read routes.
Before enabling tracker FGAC, confirm the agent runtime that commits
cursor/status updates runs with an authz bypass (agent API key) or holds
task.execute on the parent task, so its own writes are not 404'd.
Coordination: bases on main (child-resource machinery landed via merged
#260, checkpoint FGAC). Shares an additive edit to _get_parent_task_id and
TaskChildResourceType with the in-flight #249 (AGX1-275 task routes) and the
future AGX1-301 (state); whichever lands second rebases trivially.
resolves AGX1-307
Parent: AGX1-306
asherfink
added a commit
that referenced
this pull request
Jun 2, 2026
Enforce Fine-Grained Access Control on the agent_task_tracker routes by
delegating to the parent task. Trackers are not a SpiceDB resource (no
schema, no register_resource, no dual-write); authorization inherits from
the owning task, identical to the state/message/checkpoint paradigm:
| Operation | Route | Task permission |
|----------------|---------------------------|-----------------|
| Get tracker | GET /tracker/{tracker_id} | view (read op) |
| List trackers | GET /tracker | view (read op) |
| Update tracker | PUT /tracker/{tracker_id} | execute op |
All three routes had zero authorization before this change.
- GET /tracker/{tracker_id}: DAuthorizedId resolves tracker -> task_id and
checks task.read. A denied check OR a missing tracker collapses to 404
(not 403) so callers cannot probe cross-tenant existence by comparing
response codes. Behavior change: a missing tracker now returns 404 (was
400), because the parent-task resolution runs before the handler body and
raises ItemDoesNotExist.
- GET /tracker: DAuthorizedResourceIds(task) yields the set of tasks the
caller may view; trackers under unauthorized tasks are silently dropped
(never 403), and an explicit ?task_id= the caller cannot view yields an
empty list (never 404). agent_id remains a plain filter.
- PUT /tracker/{tracker_id}: DAuthorizedId resolves tracker -> task_id and
checks task.execute, mirroring the message/checkpoint write routes, with
the same 404 collapse on denial. Closes the cross-tenant write gap where
the read routes were locked down but the mutating route was not.
No SpiceDB type, no dual-write, no agentex-auth change, and no deploy gate:
read already maps to task.view and is foundational across all SGP envs.
Rollout note: PUT enforcement lands behind the same flag as the read routes.
Before enabling tracker FGAC, confirm the agent runtime that commits
cursor/status updates runs with an authz bypass (agent API key) or holds
task.execute on the parent task, so its own writes are not 404'd.
Coordination: bases on main (child-resource machinery landed via merged
#260, checkpoint FGAC). Shares an additive edit to _get_parent_task_id and
TaskChildResourceType with the in-flight #249 (AGX1-275 task routes) and the
future AGX1-301 (state); whichever lands second rebases trivially.
resolves AGX1-307
Parent: AGX1-306
asherfink
added a commit
that referenced
this pull request
Jun 2, 2026
…246) ## Related work Parent epic: [AGX1-264](https://linear.app/scale-epd/issue/AGX1-264) — per-task FGAC. Follow-ups bundled in [AGX1-291](https://linear.app/scale-epd/issue/AGX1-291). This change is part of an 8-PR stack across 3 repos (4 merged, 4 open). | Repo | PR | Purpose | |---|---|---| | scaleapi/scaleapi | ~~scaleapi/scaleapi#144783~~ ✅ merged | sgp-authz — `Action.CANCEL` + `parent` on `register_resource` | | scaleapi/scaleapi | ~~scaleapi/scaleapi#145000~~ ✅ merged | register `FGAC_AGENTEX_AUTH_SPARK` routing flag | | scaleapi/scaleapi | ~~scaleapi/scaleapi#145044~~ ✅ merged | add `cancel` to SGP's `AgentexOperation` enum + role map | | scaleapi/agentex | ~~scaleapi/agentex#353~~ ✅ merged | agentex-auth per-account provider routing + `cancel` op | | scaleapi/scaleapi | scaleapi/scaleapi#145521 | register the `fgac-tasks-dual-write` per-account flag | | scaleapi/agentex | scaleapi/agentex#358 | agentex-auth: gate register/deregister by `fgac-<resource>-dual-write` | | **scaleapi/scale-agentex** | **this PR** | **register tasks in the FGAC authz graph on create/delete** | | scaleapi/scale-agentex | #249 | per-RPC task permission rewire (`update`/`cancel`) + 404/403 wrap | **Merge order:** flag (scaleapi/scaleapi#145521, anytime) → gate (scaleapi/agentex#358) → #246 → enable `fgac-tasks-dual-write` per account → #249 (after the `cancel` enum is live in every SGP env). ## Summary Wires task create/delete into the FGAC authorization graph (the egp-api-backend dual-write pattern). The generic register/deregister plumbing already landed via #260, so this PR is just the task call sites plus tests. Whether the calls actually write to Spark AuthZ is gated per-account in agentex-auth (scaleapi/agentex#358) by the `fgac-tasks-dual-write` flag (scaleapi/scaleapi#145521); this OSS repo always calls through. - **create:** `register_resource(task, parent=agent)` before persisting the row. If the persist fails, a compensating `deregister_resource` cleans up the tuple and the original error re-raises (register-first means a register failure aborts with no row written). - **delete:** delete the row first, then deregister best-effort — failures are logged and swallowed, since Postgres is the source of truth for existence and a tuple on an already-deleted row is invisible to reads. The task id is resolved before the delete so the deregister can't race a name lookup. - `AgentTaskService` now takes `AuthorizationService` via DI. No schema/migration, env var, or retry/metrics helper: those were part of an earlier iteration, dropped now that the plumbing lives in #260 and the gating in agentex-auth. ## Tests `test_task_fgac_dual_write.py`: register-before-persist ordering, compensation on persist failure, deregister-after-delete, deregister-failure swallow, and missing-name no-op. Fixtures updated to pass a no-op authorization service. (Integration tests run in CI; they need Docker testcontainers locally.) Ruff + ruff-format clean. <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR wires task `create` and `delete` into the FGAC authorization graph using the dual-write pattern: `register_resource` fires before the Postgres insert (registration failure aborts cleanly, no row written), a compensating `deregister_resource` runs if the insert subsequently fails, and `delete_task` deregisters best-effort after the row is gone. - `AgentTaskService` now accepts `DAuthorizationService` via DI; all existing callers and test fixtures are updated with a shared `make_noop_authorization_service()` helper. - New integration test class `TestTaskDualWrite` covers register-before-persist ordering, compensation on persist failure, deregister-after-delete, deregister-failure swallow, and missing-name no-op. - Test files also carry a bulk reformat of `assert (expr), msg` → `assert expr, (msg)` for ruff compliance. <details><summary><h3>Confidence Score: 5/5</h3></summary> Safe to merge; the dual-write ordering is correct and all failure modes (register failure, persist failure, deregister failure) are intentionally handled with clear compensation logic. The create/delete authz wiring follows the documented dual-write contract. The compensation on persist failure is well-tested, and the best-effort deregister on delete correctly swallows errors. Observations raised (asyncio cancellation gap, test tier placement) are both acknowledged by the AGX1-291 runbook or are non-blocking style points — neither affects the correctness of the shipped code path. No files require special attention. </details> <h3>Important Files Changed</h3> | Filename | Overview | |----------|----------| | agentex/src/domain/services/task_service.py | Core FGAC dual-write wiring: register-before-persist in create_task with compensation, and best-effort deregister in delete_task; logic and comments are solid. | | agentex/tests/integration/use_cases/test_task_authz_dual_write.py | New integration tests covering register-before-persist, compensation on persist failure, deregister-after-delete, failure swallow, and missing-name no-op; one mock-only test is misclassified in the integration suite. | | agentex/tests/fixtures/services.py | Adds make_noop_authorization_service() shared helper and threads authorization_service into create_task_service factory; clean and consistent. | | agentex/tests/integration/fixtures/integration_client.py | Passes make_noop_authorization_service() to AgentTaskService in isolated_integration_app; minimal and correct change. | | agentex/tests/integration/test_task_stream.py | Adds noop authorization_service to two AgentTaskService constructions; also reformats assert messages to ruff-preferred style. | </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant C as Caller participant TS as AgentTaskService participant AZ as AuthorizationService participant DB as TaskRepository Note over C,DB: create_task (register-before-persist) C->>TS: create_task(agent, name, ...) TS->>AZ: "register_resource(task, parent=agent)" alt register fails AZ-->>TS: raise TS-->>C: re-raise (no row written) end AZ-->>TS: ok TS->>DB: create(agent_id, task) alt persist fails DB-->>TS: raise TS->>AZ: deregister_resource(task) [compensation] TS-->>C: re-raise original error end DB-->>TS: task_entity TS-->>C: task_entity Note over C,DB: delete_task (best-effort deregister) C->>TS: "delete_task(id | name)" opt name provided, id unknown TS->>DB: "get(name=name)" DB-->>TS: task.id end TS->>DB: delete(id, name) DB-->>TS: ok TS->>AZ: deregister_resource(task_id) [best-effort] alt deregister fails AZ-->>TS: raise TS->>TS: log + swallow end TS-->>C: void ``` </details> <a href="https://app.greptile.com/api/ide/cursor?prompt=Fix%20the%20following%202%20code%20review%20issues.%20Work%20through%20them%20one%20at%20a%20time%2C%20proposing%20concise%20fixes.%0A%0A---%0A%0A%23%23%23%20Issue%201%20of%202%0Aagentex%2Ftests%2Fintegration%2Fuse_cases%2Ftest_task_authz_dual_write.py%3A140-158%0A**Mock-only%20test%20misclassified%20as%20integration**%0A%0A%60test_create_compensates_with_deregister_when_persist_fails%60%20uses%20only%20%60Mock%28%29%60%20%2F%20%60AsyncMock%60%20%E2%80%94%20no%20%60isolated_repositories%60%2C%20no%20DB%20or%20Redis.%20Because%20it%20lives%20inside%20%60%40pytest.mark.integration%60%20%60TestTaskDualWrite%60%2C%20pytest%20includes%20it%20in%20the%20integration%20suite%2C%20requiring%20testcontainers%20to%20be%20running%20before%20it%20can%20execute.%20Moving%20it%20to%20%60agentex%2Ftests%2Funit%2Fuse_cases%2Ftest_task_fgac_dual_write.py%60%20%28or%20similar%29%20would%20let%20it%20run%20in%20the%20fast%20unit%20pass%20and%20give%20earlier%20feedback%20on%20the%20compensation%20logic.%0A%0A%23%23%23%20Issue%202%20of%202%0Aagentex%2Fsrc%2Fdomain%2Fservices%2Ftask_service.py%3A84-93%0A**%60asyncio.CancelledError%60%20bypasses%20compensation%20deregister**%0A%0AIn%20Python%203.8%2B%2C%20%60asyncio.CancelledError%60%20is%20a%20%60BaseException%60%2C%20not%20an%20%60Exception%60.%20If%20the%20incoming%20request%20is%20cancelled%20%28client%20disconnect%2C%20gateway%20timeout%29%20while%20%60task_repository.create%60%20is%20awaiting%20the%20DB%20write%2C%20the%20cancellation%20propagates%20straight%20through%20%60except%20Exception%60%20without%20triggering%20the%20compensation%20%60deregister_resource%60.%20The%20task%20is%20then%20registered%20in%20the%20authz%20graph%20but%20never%20written%20to%20Postgres.%20This%20is%20the%20same%20orphan%20scenario%20covered%20by%20the%20AGX1-291%20runbook%2C%20but%20it%20can%20also%20be%20triggered%20silently%20mid-flight%20%E2%80%94%20worth%20noting%20if%20the%20scan-by-%60creator_user_id%60%20cleanup%20is%20the%20intended%20recovery%20path.%0A%0A&pr=246&platform=github"><picture><source media="(prefers-color-scheme: dark)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCursorDark.svg?v=3"><source media="(prefers-color-scheme: light)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCursor.svg?v=3"><img alt="Fix All in Cursor" src="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCursor.svg?v=3" height="20"></picture></a> <a href="https://app.greptile.com/ide/claude-code?prompt=Fix%20the%20following%202%20code%20review%20issues.%20Work%20through%20them%20one%20at%20a%20time%2C%20proposing%20concise%20fixes.%0A%0A---%0A%0A%23%23%23%20Issue%201%20of%202%0Aagentex%2Ftests%2Fintegration%2Fuse_cases%2Ftest_task_authz_dual_write.py%3A140-158%0A**Mock-only%20test%20misclassified%20as%20integration**%0A%0A%60test_create_compensates_with_deregister_when_persist_fails%60%20uses%20only%20%60Mock%28%29%60%20%2F%20%60AsyncMock%60%20%E2%80%94%20no%20%60isolated_repositories%60%2C%20no%20DB%20or%20Redis.%20Because%20it%20lives%20inside%20%60%40pytest.mark.integration%60%20%60TestTaskDualWrite%60%2C%20pytest%20includes%20it%20in%20the%20integration%20suite%2C%20requiring%20testcontainers%20to%20be%20running%20before%20it%20can%20execute.%20Moving%20it%20to%20%60agentex%2Ftests%2Funit%2Fuse_cases%2Ftest_task_fgac_dual_write.py%60%20%28or%20similar%29%20would%20let%20it%20run%20in%20the%20fast%20unit%20pass%20and%20give%20earlier%20feedback%20on%20the%20compensation%20logic.%0A%0A%23%23%23%20Issue%202%20of%202%0Aagentex%2Fsrc%2Fdomain%2Fservices%2Ftask_service.py%3A84-93%0A**%60asyncio.CancelledError%60%20bypasses%20compensation%20deregister**%0A%0AIn%20Python%203.8%2B%2C%20%60asyncio.CancelledError%60%20is%20a%20%60BaseException%60%2C%20not%20an%20%60Exception%60.%20If%20the%20incoming%20request%20is%20cancelled%20%28client%20disconnect%2C%20gateway%20timeout%29%20while%20%60task_repository.create%60%20is%20awaiting%20the%20DB%20write%2C%20the%20cancellation%20propagates%20straight%20through%20%60except%20Exception%60%20without%20triggering%20the%20compensation%20%60deregister_resource%60.%20The%20task%20is%20then%20registered%20in%20the%20authz%20graph%20but%20never%20written%20to%20Postgres.%20This%20is%20the%20same%20orphan%20scenario%20covered%20by%20the%20AGX1-291%20runbook%2C%20but%20it%20can%20also%20be%20triggered%20silently%20mid-flight%20%E2%80%94%20worth%20noting%20if%20the%20scan-by-%60creator_user_id%60%20cleanup%20is%20the%20intended%20recovery%20path.%0A%0A&repo=scaleapi%2Fscale-agentex&pr=246&platform=github"><picture><source media="(prefers-color-scheme: dark)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInClaudeDark.svg?v=3"><source media="(prefers-color-scheme: light)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInClaude.svg?v=3"><img alt="Fix All in Claude Code" src="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInClaude.svg?v=3" height="20"></picture></a> <a href="https://chatgpt.com/codex/deeplink?prompt=IMPORTANT%3A%20Work%20in%20the%20repository%20%22scaleapi%2Fscale-agentex%22%20on%20the%20existing%20branch%20%22asher.fink%2Fagx1-274-task-dual-write%22.%20Checkout%20that%20branch%20%E2%80%94%20do%20NOT%20create%20a%20new%20branch%20or%20open%20a%20new%20PR.%20Push%20your%20changes%20to%20%22asher.fink%2Fagx1-274-task-dual-write%22.%0A%0AFix%20the%20following%202%20code%20review%20issues.%20Work%20through%20them%20one%20at%20a%20time%2C%20proposing%20concise%20fixes.%0A%0A---%0A%0A%23%23%23%20Issue%201%20of%202%0Aagentex%2Ftests%2Fintegration%2Fuse_cases%2Ftest_task_authz_dual_write.py%3A140-158%0A**Mock-only%20test%20misclassified%20as%20integration**%0A%0A%60test_create_compensates_with_deregister_when_persist_fails%60%20uses%20only%20%60Mock%28%29%60%20%2F%20%60AsyncMock%60%20%E2%80%94%20no%20%60isolated_repositories%60%2C%20no%20DB%20or%20Redis.%20Because%20it%20lives%20inside%20%60%40pytest.mark.integration%60%20%60TestTaskDualWrite%60%2C%20pytest%20includes%20it%20in%20the%20integration%20suite%2C%20requiring%20testcontainers%20to%20be%20running%20before%20it%20can%20execute.%20Moving%20it%20to%20%60agentex%2Ftests%2Funit%2Fuse_cases%2Ftest_task_fgac_dual_write.py%60%20%28or%20similar%29%20would%20let%20it%20run%20in%20the%20fast%20unit%20pass%20and%20give%20earlier%20feedback%20on%20the%20compensation%20logic.%0A%0A%23%23%23%20Issue%202%20of%202%0Aagentex%2Fsrc%2Fdomain%2Fservices%2Ftask_service.py%3A84-93%0A**%60asyncio.CancelledError%60%20bypasses%20compensation%20deregister**%0A%0AIn%20Python%203.8%2B%2C%20%60asyncio.CancelledError%60%20is%20a%20%60BaseException%60%2C%20not%20an%20%60Exception%60.%20If%20the%20incoming%20request%20is%20cancelled%20%28client%20disconnect%2C%20gateway%20timeout%29%20while%20%60task_repository.create%60%20is%20awaiting%20the%20DB%20write%2C%20the%20cancellation%20propagates%20straight%20through%20%60except%20Exception%60%20without%20triggering%20the%20compensation%20%60deregister_resource%60.%20The%20task%20is%20then%20registered%20in%20the%20authz%20graph%20but%20never%20written%20to%20Postgres.%20This%20is%20the%20same%20orphan%20scenario%20covered%20by%20the%20AGX1-291%20runbook%2C%20but%20it%20can%20also%20be%20triggered%20silently%20mid-flight%20%E2%80%94%20worth%20noting%20if%20the%20scan-by-%60creator_user_id%60%20cleanup%20is%20the%20intended%20recovery%20path.%0A%0A"><picture><source media="(prefers-color-scheme: dark)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCodexDark.svg?v=3"><source media="(prefers-color-scheme: light)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCodex.svg?v=3"><img alt="Fix All in Codex" src="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCodex.svg?v=3" height="20"></picture></a> <details><summary>Prompt To Fix All With AI</summary> `````markdown Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes. --- ### Issue 1 of 2 agentex/tests/integration/use_cases/test_task_authz_dual_write.py:140-158 **Mock-only test misclassified as integration** `test_create_compensates_with_deregister_when_persist_fails` uses only `Mock()` / `AsyncMock` — no `isolated_repositories`, no DB or Redis. Because it lives inside `@pytest.mark.integration` `TestTaskDualWrite`, pytest includes it in the integration suite, requiring testcontainers to be running before it can execute. Moving it to `agentex/tests/unit/use_cases/test_task_fgac_dual_write.py` (or similar) would let it run in the fast unit pass and give earlier feedback on the compensation logic. ### Issue 2 of 2 agentex/src/domain/services/task_service.py:84-93 **`asyncio.CancelledError` bypasses compensation deregister** In Python 3.8+, `asyncio.CancelledError` is a `BaseException`, not an `Exception`. If the incoming request is cancelled (client disconnect, gateway timeout) while `task_repository.create` is awaiting the DB write, the cancellation propagates straight through `except Exception` without triggering the compensation `deregister_resource`. The task is then registered in the authz graph but never written to Postgres. This is the same orphan scenario covered by the AGX1-291 runbook, but it can also be triggered silently mid-flight — worth noting if the scan-by-`creator_user_id` cleanup is the intended recovery path. ````` </details> <sub>Reviews (8): Last reviewed commit: ["feat(tasks): register tasks in authoriza..."](6c869dc) | [Re-trigger Greptile](https://app.greptile.com/api/retrigger?id=34256125)</sub> <!-- /greptile_comment -->
asherfink
added a commit
that referenced
this pull request
Jun 3, 2026
## Related work Parent epic: [AGX1-264](https://linear.app/scale-epd/issue/AGX1-264) — per-task FGAC. Follow-ups bundled in [AGX1-291](https://linear.app/scale-epd/issue/AGX1-291). This change is part of an 8-PR stack across 3 repos (4 merged, 4 open). | Repo | PR | Purpose | |---|---|---| | scaleapi/scaleapi | ~~scaleapi/scaleapi#144783~~ ✅ merged | sgp-authz — `Action.CANCEL` + `parent` on `register_resource` | | scaleapi/scaleapi | ~~scaleapi/scaleapi#145000~~ ✅ merged | register `FGAC_AGENTEX_AUTH_SPARK` routing flag | | scaleapi/scaleapi | ~~scaleapi/scaleapi#145044~~ ✅ merged | add `cancel` to SGP's `AgentexOperation` enum + role map | | scaleapi/agentex | ~~scaleapi/agentex#353~~ ✅ merged | agentex-auth per-account provider routing + `cancel` op | | scaleapi/scaleapi | scaleapi/scaleapi#145521 | register the `fgac-tasks-dual-write` per-account flag | | scaleapi/agentex | scaleapi/agentex#358 | agentex-auth: gate register/deregister by `fgac-<resource>-dual-write` | | scaleapi/scale-agentex | #246 | register tasks in the FGAC authz graph on create/delete | | **scaleapi/scale-agentex** | **this PR** | **per-RPC task permission rewire (`update`/`cancel`) + 404/403 wrap** | **Merge order:** flag (scaleapi/scaleapi#145521, anytime) → gate (scaleapi/agentex#358) → #246 → enable `fgac-tasks-dual-write` per account → #249 (after the `cancel` enum is live in every SGP env). ## Summary Last in the stack: the route-side change that exercises the per-task permissions. Routes each task RPC to the right operation instead of `execute` everywhere, and collapses task authz denials to 404 so callers can't use 403-vs-404 to probe cross-tenant existence. - **Per-operation routing:** `MESSAGE_SEND`/`EVENT_SEND` → `update`, `TASK_CANCEL` → `cancel`, `TASK_CREATE` stays `create`. The `execute → update` swap also lands on the message routes so an editor (not just an owner) can do routine mutations, matching the task schema's `update = editor + owner`. - **404 collapse:** a shared `check_task_or_collapse_to_404` helper turns any task denial into `ItemDoesNotExist` (404) across the path / query / body / **name** routes — the name route most of all, since `tasks.name` is globally unique. (This is the canonical wrap #260 left a TODO to adopt; this PR adopts it.) - **Fail-closed:** an unknown `AgentRPCMethod` now raises `NotImplementedError` instead of dispatching authz-free. - The list surface (only return readable tasks) is already covered by existing `DAuthorizedResourceIds` wiring — no change here. **Merge gate:** the `execute → update` swap is already satisfied by SGP's schema, but `cancel` must be live in every SGP env first (scaleapi/scaleapi#145044). Held until that ships. Out of scope (AGX1-291): the agent name-route has the same 403/404 leak shape, and restoring the 403/404 split for same-tenant calls waits on task tenant-scoping (AGX1-290). ## Tests `test_tasks_authz.py`: per-RPC operation routing, 404-collapse, body-id / name task wraps, fail-closed unknown method, and wire-format pins for `update`/`cancel`. Ruff + ruff-format clean. <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR completes the per-task FGAC wiring by routing each RPC operation to its correct authorization type (`MESSAGE_SEND`/`EVENT_SEND` → `update`, `TASK_CANCEL` → `cancel`, `TASK_CREATE` stays `create`) and introducing a shared `check_task_or_collapse_to_404` helper that uniformly folds any task authorization denial into `ItemDoesNotExist` (404) across path, query, body, and name routes. - **Per-operation routing:** All four `_authorize_rpc_request` cases and all four `messages.py` body-id guards are rewired from `execute` to the appropriate fine-grained operation; an unknown `AgentRPCMethod` now raises `NotImplementedError` instead of silently bypassing authz. - **404-collapse consolidation:** The inline `try/except AuthorizationError → ItemDoesNotExist` blocks previously duplicated in `DAuthorizedId`, `DAuthorizedQuery`, and `DAuthorizedBodyId` are replaced by the new `check_task_or_collapse_to_404` helper; `DAuthorizedName` and the RPC routes now use the same helper, closing the name-route existence-probe leak. - **Test coverage:** New `test_tasks_authz.py` pins operation routing per RPC method, denial-to-404 collapse, body-id/name wraps, fail-closed behavior for unknown methods, and wire-format strings for both `update` and `cancel`. <details><summary><h3>Confidence Score: 5/5</h3></summary> Safe to merge once the `cancel` operation is confirmed live in all SGP environments per the stated merge gate. The change is well-scoped: it replaces duplicated inline try/except blocks with a single tested helper, routes each RPC method to the right operation, and hardens the default case to fail-closed. The 404-collapse trade-off is explicitly documented with a tracked follow-up. Test coverage is thorough, including wire-format pins and denial-collapse assertions. The only finding is a missing ticket reference on the TODO, which does not affect runtime behavior. No files require special attention beyond confirming the `cancel` merge gate (scaleapi/scaleapi#145044 live in all envs) before deploying. </details> <h3>Important Files Changed</h3> | Filename | Overview | |----------|----------| | agentex/src/utils/task_authorization.py | New shared helper `check_task_or_collapse_to_404` that wraps `authorization.check` for task resources and collapses any `AuthorizationError` to `ItemDoesNotExist` (404); well-documented trade-off with TODO for future restoration of 403 split. | | agentex/src/api/routes/agents.py | Per-RPC operation rewire: MESSAGE_SEND/EVENT_SEND → `update`, TASK_CANCEL → `cancel`, TASK_CREATE unchanged; unknown methods now raise `NotImplementedError` (fail-closed) instead of silently bypassing authz. | | agentex/src/utils/authorization_shortcuts.py | Refactors DAuthorizedId/DAuthorizedQuery/DAuthorizedBodyId/DAuthorizedName to use the shared `check_task_or_collapse_to_404` helper; adds explicit `elif resource_type == AgentexResourceType.task` branches for direct task resources; removes now-redundant inline try/except blocks. | | agentex/src/api/routes/messages.py | All four message-route `DAuthorizedBodyId` calls updated from `execute` to `update` operation — consistent with the broader execute→update rewire for mutation paths. | | agentex/src/api/schemas/authorization_types.py | Adds `cancel = "cancel"` to `AuthorizedOperationType` enum; wire-format value pinned by tests. | | agentex/tests/unit/api/test_tasks_authz.py | New 446-line test file covering per-RPC operation routing, 404-collapse behavior, body-id/name wraps, fail-closed unknown methods, and wire-format pins for `update`/`cancel`. | </details> <details><summary><h3>Flowchart</h3></summary> ```mermaid %%{init: {'theme': 'neutral'}}%% flowchart TD A[Incoming RPC Request] --> B{request.method} B -->|TASK_CREATE| C[check: create on task/*] B -->|MESSAGE_SEND| D{task_id?} B -->|EVENT_SEND| G{task_id?} B -->|TASK_CANCEL| J{task_id?} B -->|unknown| M[raise NotImplementedError → 5xx] D -->|yes| E[check_task_or_collapse_to_404 operation=update] D -->|task_name| F{task exists?} F -->|yes| E2[check_task_or_collapse_to_404 operation=update] F -->|no / ItemDoesNotExist| C2[check: create on task/*] G -->|yes| H[check_task_or_collapse_to_404 operation=update] G -->|task_name| I[get_task → check_task_or_collapse_to_404 operation=update] J -->|yes| K[check_task_or_collapse_to_404 operation=cancel] J -->|task_name| L[get_task → check_task_or_collapse_to_404 operation=cancel] E --> N{AuthorizationError?} E2 --> N H --> N I --> N K --> N L --> N N -->|yes| O[raise ItemDoesNotExist → 404] N -->|no| P[Request proceeds] ``` </details> <a href="https://app.greptile.com/api/ide/cursor?prompt=Fix%20the%20following%201%20code%20review%20issue.%20Work%20through%20them%20one%20at%20a%20time%2C%20proposing%20concise%20fixes.%0A%0A---%0A%0A%23%23%23%20Issue%201%20of%201%0Aagentex%2Fsrc%2Futils%2Ftask_authorization.py%3A329-331%0A**TODO%20missing%20Linear%20ticket%20reference**%0A%0AThe%20TODO%20here%20describes%20deferring%20the%20403%2F404%20split%20restoration%2C%20which%20the%20PR%20description%20maps%20to%20AGX1-291.%20Without%20a%20ticket%20link%2C%20this%20TODO%20is%20hard%20to%20track%20down%20via%20issue-number%20search%20and%20easy%20to%20forget.%20Per%20project%20convention%2C%20each%20TODO%20should%20include%20the%20ticket%20number%20inline.%0A%0A&pr=249&platform=github"><picture><source media="(prefers-color-scheme: dark)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCursorDark.svg?v=3"><source media="(prefers-color-scheme: light)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCursor.svg?v=3"><img alt="Fix All in Cursor" src="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCursor.svg?v=3" height="20"></picture></a> <a href="https://app.greptile.com/ide/claude-code?prompt=Fix%20the%20following%201%20code%20review%20issue.%20Work%20through%20them%20one%20at%20a%20time%2C%20proposing%20concise%20fixes.%0A%0A---%0A%0A%23%23%23%20Issue%201%20of%201%0Aagentex%2Fsrc%2Futils%2Ftask_authorization.py%3A329-331%0A**TODO%20missing%20Linear%20ticket%20reference**%0A%0AThe%20TODO%20here%20describes%20deferring%20the%20403%2F404%20split%20restoration%2C%20which%20the%20PR%20description%20maps%20to%20AGX1-291.%20Without%20a%20ticket%20link%2C%20this%20TODO%20is%20hard%20to%20track%20down%20via%20issue-number%20search%20and%20easy%20to%20forget.%20Per%20project%20convention%2C%20each%20TODO%20should%20include%20the%20ticket%20number%20inline.%0A%0A&repo=scaleapi%2Fscale-agentex&pr=249&platform=github"><picture><source media="(prefers-color-scheme: dark)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInClaudeDark.svg?v=3"><source media="(prefers-color-scheme: light)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInClaude.svg?v=3"><img alt="Fix All in Claude Code" src="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInClaude.svg?v=3" height="20"></picture></a> <a href="https://chatgpt.com/codex/deeplink?prompt=IMPORTANT%3A%20Work%20in%20the%20repository%20%22scaleapi%2Fscale-agentex%22%20on%20the%20existing%20branch%20%22asher.fink%2Fagx1-275-task-route-migration%22.%20Checkout%20that%20branch%20%E2%80%94%20do%20NOT%20create%20a%20new%20branch%20or%20open%20a%20new%20PR.%20Push%20your%20changes%20to%20%22asher.fink%2Fagx1-275-task-route-migration%22.%0A%0AFix%20the%20following%201%20code%20review%20issue.%20Work%20through%20them%20one%20at%20a%20time%2C%20proposing%20concise%20fixes.%0A%0A---%0A%0A%23%23%23%20Issue%201%20of%201%0Aagentex%2Fsrc%2Futils%2Ftask_authorization.py%3A329-331%0A**TODO%20missing%20Linear%20ticket%20reference**%0A%0AThe%20TODO%20here%20describes%20deferring%20the%20403%2F404%20split%20restoration%2C%20which%20the%20PR%20description%20maps%20to%20AGX1-291.%20Without%20a%20ticket%20link%2C%20this%20TODO%20is%20hard%20to%20track%20down%20via%20issue-number%20search%20and%20easy%20to%20forget.%20Per%20project%20convention%2C%20each%20TODO%20should%20include%20the%20ticket%20number%20inline.%0A%0A"><picture><source media="(prefers-color-scheme: dark)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCodexDark.svg?v=3"><source media="(prefers-color-scheme: light)" srcset="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCodex.svg?v=3"><img alt="Fix All in Codex" src="https://greptile-static-assets.s3.amazonaws.com/badges/FixAllInCodex.svg?v=3" height="20"></picture></a> <details><summary>Prompt To Fix All With AI</summary> `````markdown Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes. --- ### Issue 1 of 1 agentex/src/utils/task_authorization.py:329-331 **TODO missing Linear ticket reference** The TODO here describes deferring the 403/404 split restoration, which the PR description maps to AGX1-291. Without a ticket link, this TODO is hard to track down via issue-number search and easy to forget. Per project convention, each TODO should include the ticket number inline. ````` </details> <sub>Reviews (8): Last reviewed commit: ["feat(tasks): per-RPC task permission rew..."](b382dc3) | [Re-trigger Greptile](https://app.greptile.com/api/retrigger?id=34256131)</sub> **Context used:** - Rule used - Create Linear tasks for TODO comments in code and ... ([source](https://app.greptile.com/scale-ai/-/custom-context?memory=6328a8b3-afcc-487b-aac4-241716e05e94)) **Learned From** [scaleapi/scaleapi#127117](scaleapi/scaleapi#127117) <!-- /greptile_comment -->
asherfink
added a commit
that referenced
this pull request
Jun 3, 2026
…strings (#268) Scrubs internal references (Linear ticket IDs, internal service names, internal flag names) from comments and docstrings across already-landed authorization code. Fixes issues introduced by #248, #252, #255, #257, #259, and #260. No behavior changes. <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR is a pure housekeeping change that scrubs internal references — Linear ticket IDs (e.g. `AGX1-263`, `AGX1-290`), internal service names (`SpiceDB`, `Spark`, `agentex-auth`), and internal flag/PR references — from comments and docstrings across 13 authorization-related source and test files. No executable code is touched. - Comments and docstrings across `port.py`, `authorization_service.py`, `agent_api_keys.py`, and related utilities have internal service names replaced with generic terms like \"authorization service\" or \"authorization schema\". - TODO comments in `agent_api_key_authorization.py`, `agent_authorization.py`, and `authorization_shortcuts.py` have internal ticket IDs stripped, with file-level context substituted where the ticket reference previously identified the work location. - Module docstrings in all six affected test files have ticket-prefixed deliverable descriptions replaced with plain behavioral descriptions. <details><summary><h3>Confidence Score: 5/5</h3></summary> All changes are confined to comments and docstrings; no executable code is modified. Every diff hunk touches only comment text, docstrings, or module-level strings. There are no logic, control-flow, or interface changes anywhere in the PR, making regression risk essentially zero. No files require special attention. </details> <h3>Important Files Changed</h3> | Filename | Overview | |----------|----------| | agentex/src/adapters/authorization/port.py | Docstring-only: replaced "SpiceDB" with "authorization" in register_resource docstring. | | agentex/src/api/routes/agent_api_keys.py | Comment-only: removed "SpiceDB" references from two inline comments in create/delete route handlers. | | agentex/src/domain/services/authorization_service.py | Docstring-only: replaced "SpiceDB schema" with "authorization schema" in register_resource docstring. | | agentex/src/domain/use_cases/agent_api_keys_use_case.py | Comment-only: removed unconditional-routing comment referencing internal service names and ticket numbers; also cleaned docstring referencing internal routing logic. | | agentex/src/utils/agent_api_key_authorization.py | Comment-only: replaced TODO(AGX1-290) with plain TODO, removing the Linear ticket reference. | | agentex/src/utils/agent_authorization.py | Comment-only: replaced TODO(AGX1-290) with plain TODO, removing the Linear ticket reference. | | agentex/src/utils/authorization_shortcuts.py | Comment-only: updated TODO to reference the file name instead of internal ticket/PR numbers. | | agentex/tests/integration/services/test_agent_api_key_service_dual_write.py | Docstring-only: removed references to "Spark AuthZ", internal ticket numbers, and internal service names throughout module docstring and inline comments. | | agentex/tests/integration/services/test_schedule_service_dual_write.py | Docstring-only: removed "Spark AuthZ" and "SpiceDB" references across the module docstring and inline comments. | | agentex/tests/integration/api/checkpoints/test_checkpoints_authz_api.py | Docstring-only: removed AGX1-302 ticket reference and internal terminology from module docstring. | | agentex/tests/integration/api/events/test_events_authz_api.py | Docstring-only: replaced "AGX1-244" ticket reference in module docstring with a plain description. | | agentex/tests/integration/api/messages/test_messages_authz_api.py | Docstring-only: removed AGX1-277 ticket reference from module docstring. | | agentex/tests/unit/api/test_agent_api_keys_authz.py | Docstring-only: removed AGX1-263 ticket reference and "Spark AuthZ" mention from module docstring. | </details> <details><summary><h3>Flowchart</h3></summary> ```mermaid %%{init: {'theme': 'neutral'}}%% flowchart TD A["PR #268: Scrub internal references"] --> B["Source files\n(port.py, agent_api_keys.py,\nauthorization_service.py,\nagent_api_keys_use_case.py,\n3 utils files)"] A --> C["Test files\n(6 integration + unit tests)"] B --> D["Replace: SpiceDB → authorization service/schema\nRemove: agentex-auth, Spark, internal PR refs\nStrip: AGX1-xxx ticket IDs from TODOs"] C --> E["Replace: AGX1-xxx deliverable labels → plain descriptions\nRemove: Spark AuthZ, scaleapi/agentex#353, etc."] D --> F["No executable code changed"] E --> F ``` </details> <sub>Reviews (2): Last reviewed commit: ["chore(authz): scrub internal project ref..."](62581f7) | [Re-trigger Greptile](https://app.greptile.com/api/retrigger?id=35250665)</sub> <!-- /greptile_comment -->
asherfink
added a commit
that referenced
this pull request
Jun 3, 2026
Enforce Fine-Grained Access Control on the agent_task_tracker routes by
delegating to the parent task. Trackers are not a SpiceDB resource (no
schema, no register_resource, no dual-write); authorization inherits from
the owning task, identical to the state/message/checkpoint paradigm:
| Operation | Route | Task permission |
|----------------|---------------------------|-----------------|
| Get tracker | GET /tracker/{tracker_id} | view (read op) |
| List trackers | GET /tracker | view (read op) |
| Update tracker | PUT /tracker/{tracker_id} | execute op |
All three routes had zero authorization before this change.
- GET /tracker/{tracker_id}: DAuthorizedId resolves tracker -> task_id and
checks task.read. A denied check OR a missing tracker collapses to 404
(not 403) so callers cannot probe cross-tenant existence by comparing
response codes. Behavior change: a missing tracker now returns 404 (was
400), because the parent-task resolution runs before the handler body and
raises ItemDoesNotExist.
- GET /tracker: DAuthorizedResourceIds(task) yields the set of tasks the
caller may view; trackers under unauthorized tasks are silently dropped
(never 403), and an explicit ?task_id= the caller cannot view yields an
empty list (never 404). agent_id remains a plain filter.
- PUT /tracker/{tracker_id}: DAuthorizedId resolves tracker -> task_id and
checks task.execute, mirroring the message/checkpoint write routes, with
the same 404 collapse on denial. Closes the cross-tenant write gap where
the read routes were locked down but the mutating route was not.
No SpiceDB type, no dual-write, no agentex-auth change, and no deploy gate:
read already maps to task.view and is foundational across all SGP envs.
Rollout note: PUT enforcement lands behind the same flag as the read routes.
Before enabling tracker FGAC, confirm the agent runtime that commits
cursor/status updates runs with an authz bypass (agent API key) or holds
task.execute on the parent task, so its own writes are not 404'd.
Coordination: bases on main (child-resource machinery landed via merged
#260, checkpoint FGAC). Shares an additive edit to _get_parent_task_id and
TaskChildResourceType with the in-flight #249 (AGX1-275 task routes) and the
future AGX1-301 (state); whichever lands second rebases trivially.
resolves AGX1-307
Parent: AGX1-306
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enforce Spark AuthZ on the checkpoint routes by delegating to the owning task. Checkpoints have no SpiceDB type of their own —
thread_idis the owning task id (LangGraph checkpointer convention; the SDK templates setconfigurable.thread_id = task.id), so each route checks the parenttaskdirectly viaDAuthorizedBodyId(task, …, field_name="thread_id"). Noregister_resource, no dual-write — the authz request only ever carries resource typetask.Operation mapping (matches the task SpiceDB schema):
get-tuple,listreadviewer + editor + ownerput,put-writesupdateeditor + ownerdelete-threaddeleteownerA denied
taskcheck collapses to 404 (not 403) so callers cannot probe cross-tenant existence by comparing response codes.Switches
put/put-writesfromexecute→update:executemaps to no task permission in the schema (it would lock everyone out), whereastask.update = (editor + owner) & internal_tenant_gatelets the editor role write checkpoints without owner.Changes
src/utils/authorization_shortcuts.py—DAuthorizedBodyIdcollapses a deniedtaskcheck into 404 (AuthorizationError → ItemDoesNotExist); non-task resources keep surfacing 403. Carries aTODOto converge on the canonicalcheck_task_or_collapse_to_404helper from AGX1-275 / feat(tasks): per-RPC task permission rewire and 404/403 wrap #249.src/api/routes/checkpoints.py—put/put-writes:execute → update. (get-tuple/listalreadyread;delete-threadalreadydelete.)tests/integration/api/checkpoints/test_checkpoints_authz_api.py(new) — integration tests covering the auth matrix.Behavior change beyond the ticket
The 404 collapse lives in
DAuthorizedBodyId'staskbranch, so it also applies to the other body-id task routes (messagePOST/PUT, state create) — those now collapse a denied check to 404 as well. Same cross-tenant-leak rationale, and aligns with the canonical wrap landing in AGX1-275 (#249).Out of scope
request.state.agent_identity) still bypasses authz viaAuthorizationService._bypass()— same caveat as the message-route migration.task.update.Coordination
AGX1-275 (#249) introduces the shared
check_task_or_collapse_to_404helper insrc/utils/task_authorization.pyand the sameexecute → updateswap acrosscheckpoints.py/messages.py/states.py. This PR's inlineDAuthorizedBodyIdtask-wrap (flagged with aTODO) converges on that helper when the two land together — trivial merge regardless of order.Pre-merge wire dependency: the
execute → updateswap changes the operation literal hitting agentex-auth.updateis already in SGP'sAgentexOperationenum on master, so no additional gating is needed for this PR.Test Plan
get-tuple— authorized 200 (task/read); denied 404list— authorized 200 (task/read); denied 404put— authorized 200 (task/update); denied 404put-writes— authorized 204 (task/update); denied 404delete-thread— authorized 204 (task/delete); denied 404tests/integration/api/{messages,states}/, since the sharedDAuthorizedBodyIdtask-wrap changes their write-route deny code from 403 → 404. Both suites pass; no regressions from the DAuthorizedBodyId task-wrap change.Linear Issue
resolves https://linear.app/scale-epd/issue/AGX1-302, Parent: AGX1-299.
Greptile Summary
This PR enforces fine-grained access control on checkpoint routes by delegating authorization to the owning task (
thread_id= task id), collapsing denied checks into 404 to prevent cross-tenant existence probing, and correctsput/put-writesfrom the brokenexecuteoperation toupdate.authorization_shortcuts.py:DAuthorizedBodyIdgains a task-specific branch that catchesAuthorizationErrorand re-raises asItemDoesNotExist(→ 404), matching the existing pattern inDAuthorizedIdandDAuthorizedQuery; a TODO links the planned refactor to AGX1-275.checkpoints.py:putandput-writesswitch fromexecute→update, allowing editor + owner roles to write checkpoints without locking everyone out.test_checkpoints_authz_api.py: New integration test suite verifies authorized (200/204) and denied (404) outcomes for all five routes, including operation-literal assertions for the updated write endpoints.Confidence Score: 5/5
Safe to merge — the changes are well-scoped, consistently mirror the existing DAuthorizedId/DAuthorizedQuery patterns, and all five checkpoint routes now have full authorized + denied test coverage.
The execute to update fix is a correctness improvement (execute had no mapping in the task SpiceDB schema, locking everyone out), the 404-collapse follows an established pattern already used in two other shortcuts, and the new integration tests validate operation literals and HTTP status codes on all routes including the previously uncovered put-writes and delete-thread denied paths.
No files require special attention.
Important Files Changed
Reviews (4): Last reviewed commit: "Merge branch 'main' into jenniechung/AGX..." | Re-trigger Greptile