From 1085269e9147413f6e961608d1f10ad0c16aae00 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 7 Jun 2026 21:04:14 +0000 Subject: [PATCH 1/4] Initial plan From cdd1f0b194ef11f417a6ca90e0b4f92c036b9810 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 7 Jun 2026 21:08:51 +0000 Subject: [PATCH 2/4] Plan ambient context fix Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com> --- .github/workflows/agentics-maintenance.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/agentics-maintenance.yml b/.github/workflows/agentics-maintenance.yml index ce3a10fcc68..099ebe24989 100644 --- a/.github/workflows/agentics-maintenance.yml +++ b/.github/workflows/agentics-maintenance.yml @@ -420,7 +420,6 @@ jobs: --repo "${{ github.repository }}" \ --start-date -1w \ --count 100 \ - --artifacts usage \ --output ./.cache/gh-aw/activity-report-logs \ --format markdown \ > ./.cache/gh-aw/activity-report-logs/report.md From 0691a7332429f15ce4f2ca6d31d6e3fdf3e6008c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 7 Jun 2026 21:15:24 +0000 Subject: [PATCH 3/4] Use API proxy logs for ambient context analysis Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com> --- .../daily-ambient-context-optimizer.md | 28 +++++++++++++------ 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/.github/workflows/daily-ambient-context-optimizer.md b/.github/workflows/daily-ambient-context-optimizer.md index 7f822099f35..544201bf255 100644 --- a/.github/workflows/daily-ambient-context-optimizer.md +++ b/.github/workflows/daily-ambient-context-optimizer.md @@ -1,7 +1,7 @@ --- emoji: "🌫️" name: Daily Ambient Context Optimizer -description: Samples recent agentic workflow runs, inspects the first DLLM request text, and recommends prompt, skill, and agent changes to shrink ambient context +description: Samples recent agentic workflow runs, inspects the first DLLM request from API proxy event logs, and recommends prompt, skill, and agent changes to shrink ambient context on: schedule: daily workflow_dispatch: @@ -52,7 +52,7 @@ Your job is to inspect the **first request sent to the DLLM** for several recent ## Goals 1. Sample a small but representative set of agentic workflow runs from the last 24 hours. -2. Inspect the first DLLM request text for each sampled run. +2. Inspect the first DLLM request text actually sent to the DLLM for each sampled run. 3. Use deterministic Python analysis to measure prompt bloat and repetition. 4. Recommend the highest-leverage improvements to workflow `.md` files, skill usage, and the set of agents/sub-agents. 5. Create exactly one detailed issue report. @@ -79,13 +79,14 @@ Eligibility rules: - `status == "completed"` - exclude this workflow itself -- prefer successful runs, but include up to 2 failed runs when they have usable prompt artifacts +- prefer successful runs, but include up to 2 failed runs when they have usable request artifacts - prefer breadth: no more than 2 runs from the same workflow when alternatives exist - require a usable first-request source: - - preferred: `prompt.txt` - - fallback: the first `user.message` event in `events.jsonl` + - preferred: the first DLLM request payload in `sandbox/firewall/logs/api-proxy-logs/event-logs.jsonl` or `sandbox/firewall/logs/api-proxy-logs/events.jsonl` (including the matching `sandbox/firewall-audit-logs/...` fallback path when present) + - fallback: the first `user.message` event in `sandbox/agent/logs/copilot-session-state//events.jsonl` + - use `prompt.txt` only as a compilation-debug cross-check, never as the ambient-context source of truth -Prefer higher-cost runs first by using `aic`, then `effective_tokens`, `token_usage`, `turns`, or prompt size when available. +Prefer higher-cost runs first by using `aic`, then `effective_tokens`, `token_usage`, `turns`, or first-request size when available. ### Step 3 — Enrich a subset with audits @@ -95,8 +96,9 @@ Run the `audit` MCP tool for the **2 most expensive sampled runs** so you have r Treat the first DLLM request text as: -1. `prompt.txt` when present, because it is the generated prompt sent to the agent -2. otherwise, extract the first user-message payload from the run's `events.jsonl` +1. the first DLLM request payload captured in the API proxy event log at `sandbox/firewall/logs/api-proxy-logs/event-logs.jsonl` or `sandbox/firewall/logs/api-proxy-logs/events.jsonl` (or the same path under `sandbox/firewall-audit-logs/` when that artifact layout is present), because that is the text actually sent to the DLLM +2. otherwise, extract the first user-message payload from `sandbox/agent/logs/copilot-session-state//events.jsonl` +3. read `prompt.txt` only as a secondary compilation-debug artifact for cross-checking; do not use it as the primary request text For each sampled run, save the extracted text to: @@ -120,6 +122,9 @@ Include at least: - `request_chars` - `request_lines` - `request_source` +- `request_input_tokens` when a matching API proxy token-usage entry is available +- `prompt_chars` when `prompt.txt` exists +- `request_prompt_char_delta` ## Deterministic Analysis @@ -141,10 +146,12 @@ The script must compute deterministic metrics for each sampled first request: - HTML `
` count - table row count - inline agent count (`## agent:`) +- inline linter count (`## linter:`) - inline skill count (`## skill:`) - imported skill reference count (`SKILL.md`) - duplicate line ratio - duplicate paragraph ratio +- char-to-token ratio when `request_input_tokens` is available - longest 5 sections by heading - top repeated non-trivial lines or paragraphs - count of lines mentioning tools, skills, agents, safe outputs, and workflow instructions @@ -171,6 +178,11 @@ Assess whether the request size is likely driven by: - duplicated guardrails, examples, or formatting rules - context that should be moved to deterministic `steps:` or smaller sub-agents +Review `prompt.txt` only as a compiler cross-check artifact: + +- compare its size to the authoritative API proxy request text when both are present +- if `prompt.txt` contains inline agents or inline linters that do **not** appear in the API proxy request text, classify that as a likely compilation bug instead of ambient-context evidence against the workflow author + Also review proxy/CLI feature readiness for each sampled workflow: - GitHub gh-proxy enabled (`tools.github.mode: gh-proxy`) From da7b3e1e6cbd06bf80f325a514c06b1f054ba87e Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 7 Jun 2026 21:19:02 +0000 Subject: [PATCH 4/4] Clarify ambient context log sources Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com> --- .github/workflows/daily-ambient-context-optimizer.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/daily-ambient-context-optimizer.md b/.github/workflows/daily-ambient-context-optimizer.md index 544201bf255..0d69fff2112 100644 --- a/.github/workflows/daily-ambient-context-optimizer.md +++ b/.github/workflows/daily-ambient-context-optimizer.md @@ -82,7 +82,7 @@ Eligibility rules: - prefer successful runs, but include up to 2 failed runs when they have usable request artifacts - prefer breadth: no more than 2 runs from the same workflow when alternatives exist - require a usable first-request source: - - preferred: the first DLLM request payload in `sandbox/firewall/logs/api-proxy-logs/event-logs.jsonl` or `sandbox/firewall/logs/api-proxy-logs/events.jsonl` (including the matching `sandbox/firewall-audit-logs/...` fallback path when present) + - preferred: the first DLLM request payload in the canonical `sandbox/firewall/logs/api-proxy-logs/event-logs.jsonl`, accepting the legacy `sandbox/firewall/logs/api-proxy-logs/events.jsonl` name too (including the matching `sandbox/firewall-audit-logs/...` fallback path when present) - fallback: the first `user.message` event in `sandbox/agent/logs/copilot-session-state//events.jsonl` - use `prompt.txt` only as a compilation-debug cross-check, never as the ambient-context source of truth @@ -96,7 +96,7 @@ Run the `audit` MCP tool for the **2 most expensive sampled runs** so you have r Treat the first DLLM request text as: -1. the first DLLM request payload captured in the API proxy event log at `sandbox/firewall/logs/api-proxy-logs/event-logs.jsonl` or `sandbox/firewall/logs/api-proxy-logs/events.jsonl` (or the same path under `sandbox/firewall-audit-logs/` when that artifact layout is present), because that is the text actually sent to the DLLM +1. the first DLLM request payload captured in the canonical API proxy event log `sandbox/firewall/logs/api-proxy-logs/event-logs.jsonl`, accepting the legacy `sandbox/firewall/logs/api-proxy-logs/events.jsonl` name too (or the same path under `sandbox/firewall-audit-logs/` when that artifact layout is present), because that is the text actually sent to the DLLM 2. otherwise, extract the first user-message payload from `sandbox/agent/logs/copilot-session-state//events.jsonl` 3. read `prompt.txt` only as a secondary compilation-debug artifact for cross-checking; do not use it as the primary request text @@ -124,7 +124,7 @@ Include at least: - `request_source` - `request_input_tokens` when a matching API proxy token-usage entry is available - `prompt_chars` when `prompt.txt` exists -- `request_prompt_char_delta` +- `request_prompt_char_delta` (`request_chars - prompt_chars` when both exist) ## Deterministic Analysis @@ -151,7 +151,7 @@ The script must compute deterministic metrics for each sampled first request: - imported skill reference count (`SKILL.md`) - duplicate line ratio - duplicate paragraph ratio -- char-to-token ratio when `request_input_tokens` is available +- per-request char-to-token ratio (`request_chars / request_input_tokens`) when `request_input_tokens` is available - longest 5 sections by heading - top repeated non-trivial lines or paragraphs - count of lines mentioning tools, skills, agents, safe outputs, and workflow instructions