Skip to content

[agentrx-optimizer] Daily Workflow Optimization - 2026-06-12 #38804

@github-actions

Description

@github-actions

Executive Summary

AgentRx normalized the 30 most recent gh-aw workflow runs (4.3h wall, 25.7M tokens, $6,987 AIC) into a canonical IR trajectory and combined it with per-run MCP audit telemetry. The fleet had exactly one hard failure, and its root cause is concrete and generalizable: the Daily Safe Outputs Git Simulator run was marked failure not because the agent failed — the agent succeeded (all 4 git configs PASS, read-only noop) — but because the downstream push_repo_memory job was rejected 4× by a repository ruleset (GH013: "Commits must have verified signatures") when pushing the first commit of the new orphan branch memory/git-simulator.

The push retry loop in push_repo_memory.cjs treats this permanent, non-retryable policy rejection as transient and burns 4 exponential-backoff attempts (~36s) before failing. The highest-impact fix is to detect rule-violation rejections and fail fast, plus seed new memory/* branches with a signed commit (or exempt memory/** from the verified-signatures ruleset) so repo-memory-pushing workflows succeed on first run.

AgentRx Evidence

  • Critical step: push_repo_memory job → step "Push repo-memory changes (default)" (deterministic post-agent step, IR step set #5 / run #4 in the fleet trajectory). The agent job (agent, detection, safe_outputs) all concluded success; only push_repo_memory concluded failure.
  • Failure category: Reliability — non-retryable git push rejection misclassified as transient (retry/backoff misfire). Remote error: GH013: Repository rule violations found for refs/heads/memory/git-simulator ... Commits must have verified signatures ... push declined due to repository rule violations.
  • Frequency / impact: 1/1 = 100% failure rate for this workflow (AgentRx flagged it as the fleet's sole reliability hotspot, severity=high). The failed run still consumed 558,293 tokens / 152.6 AIC and 16 turns; the retry loop added ~36s of wasted CI before the inevitable failure. Identical remote rejected errors logged on attempts 1/4, 2/4, 3/4, 4/4.
  • Representative runs: [§27397597917](https://github.com/github/gh-aw/actions/runs/27397597917) (the failure).

Secondary findings (not the recommended fix)

  • Network friction: Documentation Noob Tester blocked 85/288 requests (30%) — all to Google domains (accounts.google.com=20, android.clients.google.com=13, www.google.com, clients2.google.com, safebrowsing...), i.e. Chromium/Playwright background telemetry. Wasteful but non-fatal; a candidate for a follow-up (suppress browser telemetry via launch flags).
  • Telemetry quality: the failed run's audit produced 14 garbage subagent_model_requests (f.write, os.path.getsize, len, ...) inferred from agent-stdio.log — a parser artifact, not a real model mismatch. The audit also mislabeled the run as "failed before agent activation," which is incorrect.
  • Config gap (worked around): the run's noop notes config-simulator sub-agent type is not registered in this harness; it fell back to general-purpose agents.
AgentRx Artifacts

Run dir: /tmp/gh-aw/agent/agentrx/runs/gh-aw-daily

IR summary (Stage 1/6 — completed, deterministic): The MCP run fleet was converted to a single canonical IR trajectory gh-aw-daily-agent-fleet with 32 steps (1 instruction turn + 30 per-run turns + 1 fleet-insight turn), ir_used_llm_fallback=false, ir_from_markdown=true. Each step encodes a run's engine, status, conclusion, duration, turns, token_usage, error_count, and api calls. Validation: 1 trajectory loaded, 1 valid.

Invariant / checker / judge stages (Stages 2–5): Could not run in this environment. static, dynamic, check, and judge all require the GitHub Copilot CLI LLM endpoint (RuntimeError: 'copilot' CLI not found on PATH); Azure/TRAPI endpoints need env vars that are unset, and gh is unauthenticated. No check.json / judge.json were produced. Per the runbook's degraded-mode guidance, the recommendation below is grounded in the deterministic IR plus per-run MCP audit telemetry (which carries the exact remote error logs) rather than LLM-generated invariants.

Known limitations: AgentRx install required CPython (the default python is PyPy 7.3.23, for which the jiter dependency has no wheel and its Rust source build is firewall-blocked); the venv was rebuilt with CPython 3.12. Root-cause classification (judge) was done manually from audit logs in lieu of the LLM judge.

Failure-pattern classification (manual, in lieu of failure-pattern-classifier):

violation evidence fix_type rationale
Push to orphan memory/git-simulator rejected by ruleset remote: GH013 ... Commits must have verified signatures ... push declined (run 27397597917, push_repo_memory step 7) precondition check before expensive retry First commit on a new orphan branch is unsigned; a verified-signatures ruleset will reject it deterministically.
Retry loop retries a permanent rejection 4× Push failed (attempt 1/4...4/4), retrying; push_repo_memory.cjs:477-500 (MAX_RETRIES=3, exp backoff) retry/backoff strategy The catch-all retry can't distinguish a non-fast-forward (retryable) from a policy rejection (never retryable), wasting ~36s.
Chromium telemetry blocked (85/288) documentation-noob-tester blocked=85 total=288; blocked Google domains precondition / network config Browser phones home to Google services the firewall denies; suppress via launch flags.
Parser-inferred sub-agent model mismatches (×14) subagent_model_requests: f.write/os.path.getsize/len..., REQUESTED_MODEL_NOT_OBSERVED add/clean telemetry attributes stdio-log heuristic misreads Python snippets as model requests; use token_usage.jsonl.

Recommended Optimization

One change: In actions/setup/js/push_repo_memory.cjs, make the push retry loop (lines ~481–500) fail fast on non-retryable repository-rule rejections instead of retrying. Before the setTimeout backoff, inspect errMsg and, if it matches a permanent policy rejection (GH013, push declined due to repository rule violations, or must have verified signatures), break out of the loop immediately and core.setFailed with the actionable seed-the-branch guidance that push_signed_commits.cjs:298-301 already emits.

Reliability remediation (to make the run actually pass): seed each new memory/* orphan branch with one signed commit before the first scheduled run (the code already prints the exact git switch --orphan ... commands), or add a ruleset bypass / exempt memory/** from the "Require verified signatures" rule for the gh-aw bot identity. The GraphQL createCommitOnBranch path produces server-signed commits, but the orphan first-push falls back to a plain git push (push_signed_commits.cjs:282-301), which the ruleset rejects.

Why highest impact: It targets the fleet's only hard failure and a class of failures (every workflow that pushes a brand-new memory/* branch under this ruleset hits the identical wall). The retry change is a few lines, removes wasted CI time, and converts a confusing "failed after 4 retries" into a single actionable error; the seeding/exemption converts a 100% failure into success.

Where to implement:

  • actions/setup/js/push_repo_memory.cjs (retry loop ~481–500) — non-retryable-error short-circuit.
  • actions/setup/js/push_signed_commits.cjs (orphan-branch path ~282–301) — surface a typed/non-retryable error so the caller can branch on it.
  • Repo ruleset settings (memory/**) — seed or exempt for reliability.

Validation Plan

  • Confirm fast-fail: re-run Daily Safe Outputs Git Simulator on a still-unseeded memory/* branch; expect 1 push attempt (not 4) and a clear core.setFailed naming GH013 + seed instructions. Check the push_repo_memory step log for a single remote rejected line.
  • Confirm reliability fix: after seeding memory/git-simulator with a signed commit (or exempting memory/**), re-run and expect push_repo_memory to conclude success and the overall run conclusion=success.
  • Expected metric changes: Daily Safe Outputs Git Simulator failure rate 100% → 0%; push_repo_memory wasted retry time ~36s → ~0; fleet total_errors 1 → 0; no change to agent token spend (the agent was never the problem).

References

  • §27397597917 — Daily Safe Outputs Git Simulator (the failure; GH013 ruleset rejection in push_repo_memory)
  • §27396194767 — Documentation Noob Tester (secondary: 30% firewall block rate)
  • §27400039037 — this optimizer run

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • index.crates.io

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "index.crates.io"

See Network Configuration for more information.

Generated by ⚡ Daily AgentRx Trace Optimizer · 363.1 AIC · ⌖ 12.8 AIC · ⊞ 5.9K ·

  • expires on Jun 18, 2026, 11:13 PM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions