Skip to content

fix: isolate copilot_sdk_driver test session state writes to prevent false-positive tool-denial issues#39940

Merged
pelikhan merged 5 commits into
mainfrom
copilot/aw-fix-jsweep-tool-denial
Jun 18, 2026
Merged

fix: isolate copilot_sdk_driver test session state writes to prevent false-positive tool-denial issues#39940
pelikhan merged 5 commits into
mainfrom
copilot/aw-fix-jsweep-tool-denial

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

The jsweep run that triggered this issue actually succeeded (created a PR), but handle_agent_failure.cjs reported a guard.tool_denials_exceeded event anyway. The cause: when the agent ran npm run test:js for validation, copilot_sdk_driver.test.cjs exercised runWithCopilotSDK with mocked sessions named session-max-tool-denials etc., which wrote real guard.tool_denials_exceeded entries into /tmp/gh-aw/sandbox/agent/logs/copilot-session-state/ — the exact directory handle_agent_failure.cjs scans post-agent, producing a false-positive issue.

Changes

  • copilot_sdk_session.cjsrunWithCopilotSDK now accepts an optional sessionStateBaseDir parameter and also checks GH_AW_SESSION_STATE_BASE_DIR env var before falling back to the default path:

    const defaultSessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state");
    const sessionStateBase = sessionStateBaseDir ?? process.env.GH_AW_SESSION_STATE_BASE_DIR ?? defaultSessionStateBase;
  • copilot_sdk_driver.test.cjs — adds beforeAll/afterAll hooks that redirect all session state writes to an isolated mkdtemp directory for the test suite lifetime and clean up afterward, keeping test artifacts out of the path handle_agent_failure.cjs reads:

    beforeAll(() => {
      testSessionStateDir = fs.mkdtempSync(path.join(os.tmpdir(), "gh-aw-test-session-state-"));
      process.env.GH_AW_SESSION_STATE_BASE_DIR = testSessionStateDir;
    });
    afterAll(() => {
      delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
      fs.rmSync(testSessionStateDir, { recursive: true, force: true });
    });

Copilot AI and others added 2 commits June 18, 2026 01:12
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…false-positive tool-denial issues

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix JavaScript Unbloater tool denial issue fix: isolate copilot_sdk_driver test session state writes to prevent false-positive tool-denial issues Jun 18, 2026
Copilot AI requested a review from pelikhan June 18, 2026 01:19
@pelikhan pelikhan marked this pull request as ready for review June 18, 2026 01:40
Copilot AI review requested due to automatic review settings June 18, 2026 01:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a false-positive guard.tool_denials_exceeded detection by isolating Copilot SDK session-state JSONL writes during JavaScript validation tests, so test-only “tool denial” events don’t get picked up by post-agent log scanners.

Changes:

  • Adds a configurable Copilot session state base directory in runWithCopilotSDK (parameter + GH_AW_SESSION_STATE_BASE_DIR env override).
  • Updates copilot_sdk_driver tests to redirect session-state writes into a temporary per-suite directory and clean it up afterward.
  • Adjusts the header comment in the setup-cli install.sh script (but the new wording appears incorrect/self-referential).
Show a summary per file
File Description
actions/setup/js/copilot_sdk_session.cjs Allows redirecting Copilot session-state event log writes via option/env var to avoid polluting the default sandbox path.
actions/setup/js/copilot_sdk_driver.test.cjs Sets GH_AW_SESSION_STATE_BASE_DIR to an isolated temp dir for the test suite to prevent false-positive tool-denial detection.
actions/setup-cli/install.sh Changes the “kept in sync” header comment; current wording likely contradicts the documented generation flow.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread actions/setup-cli/install.sh Outdated
set +o histexpand

# Kept in sync with install-gh-aw.sh — edit that file, then copy to this path.
# Kept in sync with actions/setup-cli/install.sh — edit this file, then copy to that path.
Comment on lines +11 to +19
let testSessionStateDir;
beforeAll(() => {
testSessionStateDir = fs.mkdtempSync(path.join(os.tmpdir(), "gh-aw-test-session-state-"));
process.env.GH_AW_SESSION_STATE_BASE_DIR = testSessionStateDir;
});
afterAll(() => {
delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
fs.rmSync(testSessionStateDir, { recursive: true, force: true });
});
@github-actions

Copy link
Copy Markdown
Contributor

Hey @copilot-swe-agent 👋 — nice catch on the false-positive tool-denial issues caused by test session state leaking into the directory that handle_agent_failure.cjs scans. The beforeAll/afterAll isolation pattern is exactly the right approach, and the GH_AW_SESSION_STATE_BASE_DIR escape hatch keeps the fix clean and testable.

One thing to tighten up before merging:

  • Split out the install.sh comment fix — the change to actions/setup-cli/install.sh (correcting which file is the source of truth in the sync comment) is unrelated to the session state isolation fix. It's a one-liner, but mixing it here makes the PR's intent less focused. Consider either dropping it or landing it as a separate tiny PR.

If you'd like a hand splitting this up, here's a ready-to-use prompt:

In PR #39940, remove the unrelated change to actions/setup-cli/install.sh (the comment correction
from "edit that file, then copy to this path" to "edit this file, then copy to that path").
Open a separate, standalone PR for that one-line comment fix so the session state isolation
fix can be reviewed and merged on its own.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • patchdiff.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "patchdiff.githubusercontent.com"

See Network Configuration for more information.

Generated by ✅ Contribution Check ·

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed for PR #39940: PR does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). Neither Condition A nor Condition B is met.

@github-actions github-actions Bot mentioned this pull request Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100 — Excellent

Analyzed 0 new test functions in 1 modified JavaScript file: the PR adds beforeAll/afterAll test-isolation scaffolding (no new it() blocks); the test-to-production line ratio is 2.8:1 (>2:1 flag threshold), but all added lines are beforeAll/afterAll setup/teardown — not genuine test inflation.

📊 Metrics & Test Classification (0 tests analyzed)
Metric Value
New/modified tests analyzed 0
✅ Design tests (behavioral contracts) 0 (N/A)
⚠️ Implementation tests (low value) 0 (N/A)
Tests with error/edge cases 0 (N/A)
Duplicate test clusters 0
Test inflation detected YES (14:5 = 2.8:1) — all added lines are setup/teardown, not test cases
🚨 Coding-guideline violations 0

Go: 0 (*_test.go); JavaScript: 1 (*.test.cjs). No new it() / test() blocks were added.

Nature of the change: copilot_sdk_driver.test.cjs gains three imports (fs, os, path), a beforeAll that creates an isolated tmp directory and sets GH_AW_SESSION_STATE_BASE_DIR, and an afterAll that removes the env var and the tmp directory. All ~30 existing it() blocks benefit from this isolation automatically.

Production counterpart (copilot_sdk_session.cjs, +5/−2): adds a sessionStateBaseDir parameter and respects the GH_AW_SESSION_STATE_BASE_DIR env override, making session-state writes redirectable to an isolated directory during tests.

Verdict

Check passed. 0% implementation tests (threshold: 30%). No coding-guideline violations. The PR is a focused test-quality improvement: it isolates session-state writes to a tmp directory so existing tests cannot produce false-positive tool-denial failures against real paths.

🧪 Test quality analysis by Test Quality Sentinel ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). The change is test isolation infrastructure with no new test functions introduced.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — clean targeted fix with a few improvement opportunities.

📋 Key Themes & Highlights

Key Themes

  • Self-verification gap: The fix prevents test pollution via isolation hooks, but no assertion confirms the redirect actually fires — a future regression could silently undo the fix until handle_agent_failure.cjs fires again.
  • Untested/unreachable parameter: sessionStateBaseDir is added to the runWithCopilotSDK signature but isn't forwarded by the production call site (copilot_sdk_driver.cjs) and isn't tested directly. Consider removing it or adding a test that exercises it.
  • Comment narrows env-var intent to tests: GH_AW_SESSION_STATE_BASE_DIR is useful beyond tests (CI sandboxing, etc.); the comment should reflect that.

Positive Highlights

  • ✅ Root cause correctly identified and addressed at the source, not just symptom-patched
  • mkdtempSync guarantees a unique, fresh directory per run — no path collision risk
  • delete process.env.GH_AW_SESSION_STATE_BASE_DIR in afterAll keeps env clean for other test files
  • force: true on rmSync tolerates a missing directory gracefully
  • install.sh comment correction is a nice housekeeping touch

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

});
afterAll(() => {
delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
fs.rmSync(testSessionStateDir, { recursive: true, force: true });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] afterAll calls fs.rmSync(testSessionStateDir, ...) but if mkdtempSync in beforeAll throws, testSessionStateDir will be undefined, causing rmSync to throw with an unhelpful error that masks the original failure.

💡 Add a null guard
afterAll(() => {
  delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
  if (testSessionStateDir) {
    fs.rmSync(testSessionStateDir, { recursive: true, force: true });
  }
});

This keeps cleanup errors from shadowing beforeAll failures during debugging.

// Session state directory — mirrors the target path used by unified_timeline.cjs.
// /tmp/gh-aw/sandbox/agent/logs/copilot-session-state/{sessionId}/events.jsonl
const sessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state");
// GH_AW_SESSION_STATE_BASE_DIR may be set in tests to redirect writes to an isolated directory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The comment frames GH_AW_SESSION_STATE_BASE_DIR as a test-only mechanism, but the env var is a legitimate runtime override too — useful for CI job isolation, ephemeral sandboxes, or any scenario where the caller can't pass sessionStateBaseDir directly. Consider rewording to make its general-purpose intent clear.

💡 Suggested rewording
// Override session-state base directory (e.g. for test isolation or CI sandboxing).
const sessionStateBase = sessionStateBaseDir ?? process.env.GH_AW_SESSION_STATE_BASE_DIR ?? defaultSessionStateBase;

Coupling the comment to "tests" may mislead future readers who want to use it in other contexts.

* RuntimeConnection: typeof import("@github/copilot-sdk").RuntimeConnection,
* approveAll: typeof import("@github/copilot-sdk").approveAll
* },
* sessionStateBaseDir?: string,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] sessionStateBaseDir is added to the JSDoc and function signature but the production call site in copilot_sdk_driver.cjs (line 121) doesn't forward it — making the parameter unreachable via the normal CLI entrypoint. No test exercises it directly either; all tests rely on the env var path instead.

💡 Options to consider
  1. Remove the parameter and rely solely on the env var (simpler API; the env var already satisfies test isolation needs).
  2. Keep the parameter but add a unit test that passes sessionStateBaseDir explicitly and asserts that event files land there — ensuring it doesn't silently regress:
it('respects explicit sessionStateBaseDir over env var', async () => {
  const explicitDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sdk-explicit-'));
  try {
    await runWithCopilotSDK({ ..., sessionStateBaseDir: explicitDir });
    // assert files written to explicitDir, not testSessionStateDir
  } finally {
    fs.rmSync(explicitDir, { recursive: true, force: true });
  }
});

Either way, the current state leaves the parameter untested and unused in production.


describe("copilot_sdk_driver.cjs", () => {
let testSessionStateDir;
beforeAll(() => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The beforeAll/afterAll hooks correctly prevent test-generated events from polluting the production session-state path, but there's no assertion confirming the redirect actually works. Without a self-verifying test, a future refactor that breaks the isolation could go undetected until handle_agent_failure.cjs fires again.

💡 Consider a smoke-test assertion

After a runWithCopilotSDK call in an existing test, you could assert:

// At the end of an existing test that exercises a session with a known sessionId:
const entries = fs.readdirSync(testSessionStateDir, { recursive: true });
expect(entries.length).toBeGreaterThan(0); // confirms writes landed in the isolated dir

This turns the fix into a regression guard, not just an isolation helper.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core isolation fix — mkdtemp + env-var redirect in beforeAll/afterAll — correctly solves the false-positive tool-denial issue. Four concerns worth addressing before merge:

### Issues found (4)

Medium — install.sh comment references itself (install.sh:4)
The new comment says Kept in sync with actions/setup-cli/install.sh — that is the file it's in. install-gh-aw.sh was accidentally replaced with the current file's own path, and "that path" now has no referent. Should be install-gh-aw.sh — edit this file, then copy to that path.

Medium-high — GH_AW_SESSION_STATE_BASE_DIR is a production env-var override with no observability (copilot_sdk_session.cjs:112)
The comment calls this a test mechanism, but it lives unconditionally in the production fallback chain. If set in any real agent environment, all session-state writes are silently redirected and handle_agent_failure.cjs gets zero events (false negatives). The clean fix is to remove the env-var from the production path and pass the override exclusively via the sessionStateBaseDir parameter from the test harness.

Medium — sessionStateBaseDir parameter is untested dead code (copilot_sdk_session.cjs:89)
The test never passes this parameter — it relies solely on the env var. The sessionStateBaseDir ?? branch in the fallback chain is uncovered and could be silently broken by a future refactor.

Low — afterAll crashes with TypeError if beforeAll throws (copilot_sdk_driver.test.cjs:18)
force: true suppresses ENOENT, not type errors. If mkdtempSync fails, testSessionStateDir is undefined and fs.rmSync(undefined, ...) throws, masking the original failure. Guard with if (testSessionStateDir).

🔎 Code quality review by PR Code Quality Reviewer

Comment thread actions/setup-cli/install.sh Outdated
set +o histexpand

# Kept in sync with install-gh-aw.sh — edit that file, then copy to this path.
# Kept in sync with actions/setup-cli/install.sh — edit this file, then copy to that path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-referential comment: this line says Kept in sync with actions/setup-cli/install.sh — but that is this file, making the sync target undefined and the install-gh-aw.sh reference silently dropped.

💡 Suggested fix

The new wording correctly flipped edit that fileedit this file, but accidentally replaced install-gh-aw.sh with the current file's own path. Keep the sibling reference:

# Kept in sync with install-gh-aw.sh — edit this file, then copy to that path.

As written, no maintainer can tell which file is canonical or where to copy changes.

const sessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state");
// GH_AW_SESSION_STATE_BASE_DIR may be set in tests to redirect writes to an isolated directory.
const defaultSessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state");
const sessionStateBase = sessionStateBaseDir ?? process.env.GH_AW_SESSION_STATE_BASE_DIR ?? defaultSessionStateBase;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production code now has an undocumented env-var override that can silently lose all audit events: GH_AW_SESSION_STATE_BASE_DIR is described as a test mechanism but lives in the unconditional production fallback chain.

💡 Detail and suggested fix

If this env var is set in any real agent environment — by accident, CI variable leakage, or a misconfigured workflow — all session-state JSONL writes are silently redirected to an arbitrary path. handle_agent_failure.cjs reads the default path and finds nothing, turning every real tool-denial event into a silent false-negative. No warning is emitted.

The cleaner design is to keep the env-var lookup only in tests, pass the override via the sessionStateBaseDir parameter, and remove process.env.GH_AW_SESSION_STATE_BASE_DIR from this fallback chain:

// Production code — no env var
const sessionStateBase = sessionStateBaseDir ?? defaultSessionStateBase;

In the test harness, pass the dir explicitly:

await runWithCopilotSDK({ ..., sessionStateBaseDir: testSessionStateDir });

If keeping the env var, at minimum emit a visible log line when it overrides the default:

if (process.env.GH_AW_SESSION_STATE_BASE_DIR && !sessionStateBaseDir) {
  logger?.warn?.(`[session] GH_AW_SESSION_STATE_BASE_DIR overrides default session state path`);
}

* @returns {Promise<{exitCode: number, output: string, hasOutput: boolean, durationMs: number}>}
*/
async function runWithCopilotSDK({ sdkUri, prompt, logger, attempt = 0, model, connectionToken, provider, maxToolDenials, permissionConfig, coreLogger, sdkModule }) {
async function runWithCopilotSDK({ sdkUri, prompt, logger, attempt = 0, model, connectionToken, provider, maxToolDenials, permissionConfig, coreLogger, sdkModule, sessionStateBaseDir }) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionStateBaseDir parameter is added but never exercised in tests: the test suite bypasses it entirely, relying only on process.env.GH_AW_SESSION_STATE_BASE_DIR.

💡 Detail

The parameter was introduced specifically to give callers a clean injection point, but copilot_sdk_driver.test.cjs never passes it — beforeAll sets the env var instead. This means:

  1. The parameter override branch (sessionStateBaseDir ?? ...) is dead code from a test-coverage perspective.
  2. Any future breakage of the parameter path (e.g. a destructuring refactor that drops it) won't be caught.

The test should pass sessionStateBaseDir directly and stop relying on the env var, or the parameter should be removed if the env var is the intended permanent mechanism.

});
afterAll(() => {
delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
fs.rmSync(testSessionStateDir, { recursive: true, force: true });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs.rmSync(undefined, ...) will throw a TypeError if beforeAll fails: { force: true } suppresses ENOENT but not type errors from an undefined path argument.

💡 Suggested fix

If fs.mkdtempSync throws (disk full, permissions), testSessionStateDir stays undefined. Vitest still runs afterAll, and fs.rmSync(undefined, ...) throws TypeError: The "path" argument must be of type string, masking the original failure.

Guard the cleanup:

afterAll(() => {
  delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
  if (testSessionStateDir) fs.rmSync(testSessionStateDir, { recursive: true, force: true });
});

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Addressed the review feedback in fix: restore pre-existing GH_AW_SESSION_STATE_BASE_DIR in test teardown:

  • Captures prevSessionStateDir before overwriting in beforeAll, and restores it (or deletes if it was previously unset) in afterAll, matching the env save/restore pattern used elsewhere in the test suite
  • Guards the fs.rmSync call behind a truthiness check in case mkdtempSync threw

Local validation passes (make agent-report-progress green, 41/41 JS tests pass). CI needs a human re-trigger to reflect the new HEAD.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile

…-tool-denial

# Conflicts:
#	actions/setup-cli/install.sh

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@copilot merge main and recompile

Done. Merged main into the branch (resolved a conflict in actions/setup-cli/install.sh by taking main's corrected sync comment), then ran make recompile — 250/250 workflows compiled with no new changes. Local validation passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants