E2E: Add health-aware navigation to reduce false-positive failures by Dev-iL · Pull Request #64366 · apache/airflow

Dev-iL · 2026-03-28T14:34:54Z

Context

E2E tests running with 2 parallel workers produce false-positive failures in CI. The test suite runs 88 tests across chromium, firefox, and webkit, and consistently shows 2-7 failures + 2-6 flaky tests per run -- all caused by server unresponsiveness or browser-specific interaction issues rather than actual bugs.

Observed Failure Pattern

From CI logs (2026-03-28 runs):

Chromium:

7 failed: home-dashboard.spec.ts, plugins.spec.ts x2, requiredAction.spec.ts, task-instances.spec.ts, xcoms.spec.ts x2
1 flaky: plugins.spec.ts
Root causes: health check (60s) exceeded test timeout (30s); toPass budgets consumed by health check; spec beforeAll API calls using 10s actionTimeout

Firefox:

3 failed: requiredAction.spec.ts (server not ready after 60s), xcoms.spec.ts x2
1 flaky: xcoms.spec.ts
Root causes: server genuinely down >60s during concurrent heavy tests; triggerDag retry budget consumed by health check

Webkit:

0 failed, all passed
Previously: 1 failed + 6 flaky (fixed by webkit-specific changes)

Failure Analysis

All failures trace back to infrastructure-level root causes, not test logic bugs:

1. plugins.spec.ts -- Health check exceeds test timeout (chromium)

page.waitForTimeout: Test timeout of 30000ms exceeded
> 60 |     await page.waitForTimeout(Math.min(interval, remaining));

The default test timeout (30s) is shorter than the health check timeout (60s). Playwright kills the test while the health check is still polling for recovery. The health check never gets a chance to complete.

2. requiredAction.spec.ts -- Health check consumes toPass budget

Server not ready after 60000ms — health endpoint did not return 200
Timeout 60000ms exceeded while waiting on the predicate

triggerDag uses toPass({timeout: 60_000}). A single health check attempt (60s) exhausts the entire retry budget, leaving zero room for toPass to retry after recovery.

3. xcoms.spec.ts -- Same toPass/health check conflict

Timeout 60000ms exceeded while waiting on the predicate

The triggerDag call in beforeAll fails because the health check consumes the full toPass timeout. Multiple xcoms tests then fail because setup never completed.

4. task-instances.spec.ts -- Unprotected API calls in beforeAll (chromium)

apiRequestContext.patch: Timeout 10000ms exceeded
apiRequestContext.get: Timeout 10000ms exceeded

The beforeAll makes direct API calls (POST dagRuns, GET taskInstances, PATCH state) using the global 10s actionTimeout. No health check, no extended timeouts, no retry logic. When the server is overloaded by concurrent heavy tests, these API calls time out.

5. dag-calendar-tab.spec.ts -- Tooltip interaction failure

DagCalendarTab.ts:73 - expect(locator).toBeVisible() failed - getByTestId('calendar-tooltip')

Calendar cell hover fails because the tooltip never appears. On webkit, hover events are unreliable and may not trigger on the first attempt.

6. connections.spec.ts -- Combobox click timeout (webkit)

ConnectionsPage.ts:218 - locator.click: Timeout 3000ms exceeded

The combobox is found, visible, enabled, and stable, but the click action can't complete within 3s on webkit. The explicit 3s timeout is far below the global actionTimeout of 10s.

7. dag-code-tab.spec.ts -- Monaco editor slow to initialize (webkit)

DagCodePage.ts:42 - expect(locator).toBeVisible() failed - locator('[role="code"]')

The navigation retry loop (2s intervals, 60s total) repeatedly navigates and checks for the editor. With only 5s per inner check, Monaco doesn't have enough time to initialize on webkit before the next retry fires.

Root Cause Analysis

Eight interconnected root causes identified during investigation:

#	Root Cause	Impact	Affected Tests
1	No server health gating -- Tests navigate blindly regardless of server state	When server is overwhelmed by heavy tests (HITL, backfill), all concurrent tests fail	ALL failing tests
2	BasePage.navigateTo lacks resilience -- Simple `page.goto()` with no retry or health check	Navigation times out within test timeout if server is slow	XComs, TaskInstances, all `navigateTo` consumers
3	No inter-test recovery -- No mechanism to wait for server to recover between tests	Heavy test on worker 1 overwhelms server, test on worker 2 fails, retries fail too	Cross-worker failures
4	Page objects bypass BasePage with fragile `page.goto()` calls -- 12 direct `page.goto()` calls across 4 page objects bypass any protection in `navigateTo`	Even if `navigateTo` is hardened, these bypass calls remain vulnerable	RequiredActions (7), DagsPage (3), Calendar (1), Events (1)
5	Health check response time threshold too strict for Firefox -- Original health check required HTTP 200 AND response < 2s, but Firefox on CI regularly exceeds 2s	Health check condition is never satisfied, causing 60s timeout even though the server is healthy	ALL Firefox tests
6	Element visibility timeouts too tight -- After navigation (click-based or programmatic), page elements must fetch API data and render. Hardcoded 3-10s timeouts are insufficient on Firefox/webkit under CI load	Server IS responsive, but post-navigation rendering exceeds tight timeouts	RequiredActions, TaskInstances, Connections, XComs, DagCode
7	Webkit-specific interaction unreliability -- Webkit hover events don't reliably trigger tooltips on first attempt; click dispatch is slower than chromium/firefox	Single-attempt interactions fail intermittently on webkit	DagCalendarTab (hover/tooltip), Connections (combobox click)
8	Health check / test timeout imbalance -- Health check MAX_WAIT (60s) ≥ default test timeout (30s) and `toPass` budgets (60s), consuming entire time budgets with no room for retries or actual work	Tests die mid-health-check; `toPass` loops can't retry after health check timeout	plugins, requiredAction, xcoms, home-dashboard (all browsers)

Approach

The fix is a six-layer approach across shared infrastructure, page objects, config, and one spec file:

Layer 1: Health Check Utility (`tests/e2e/utils/health.ts`)

New shared utility that polls /api/v2/monitor/health (unauthenticated endpoint already used by Breeze for startup checks). Checks HTTP 200 only -- no response time threshold, since Firefox on CI regularly exceeds 2s for health responses (making time-based checks unsatisfiable). Uses exponential backoff intervals [1s, 2s, 4s, 8s] with 30s max wait and 10s per-request timeout. When the server is healthy (the common case), returns immediately with negligible overhead.

Layer 2: Health-Aware BasePage Navigation (`BasePage.safeGoto()`)

Added a protected safeGoto() method to BasePage that wraps waitForServerReady() + page.goto(). All page object subclasses use this.safeGoto() instead of this.page.goto() directly. The existing navigateTo() delegates to this.safeGoto(). Named safeGoto (not goto) to avoid colliding with AssetDetailPage.goto(), which would cause infinite recursion via polymorphic dispatch through navigateTo().

Layer 3: Consistent Element Visibility Timeouts

After click-based navigation (e.g., clicking a "required action" link), the health check does not apply -- the server IS responsive, but React components must still fetch API data and render. On Firefox/webkit under CI load, this regularly exceeds 10s. Increased element visibility timeouts from 10s to 30s across RequiredActionsPage (4 handle* methods), TaskInstancesPage.navigate(), and XComsPage.navigate(). Also increased ConnectionsPage combobox click timeout from 3s to 10s, DagCodePage inner Monaco editor check from 5s to 10s per retry attempt, and RequiredActionsPage wait_for_default_option task completion timeout from 30s to 60s.

Layer 4: Webkit Interaction Reliability

Webkit hover events don't reliably trigger tooltips on the first attempt. Wrapped DagCalendarTab.getManualRunStates() hover+tooltip check in a toPass retry loop (500ms intervals, 20s total) with force: true on hover. This retries the hover if the tooltip doesn't appear, without changing the overall timeout budget.

Layer 5: Health Check / Test Timeout Rebalancing

The 60s health check exceeded the 30s default test timeout -- tests were killed while the health check was still polling. It also consumed the entire toPass budget in triggerDag (60s), leaving no room for retries.

Reduced health check MAX_WAIT_MS from 60s to 30s
Increased default test timeout from 30s to 60s in playwright.config.ts

This ensures the health check always fits within the test's time budget (30s < 60s), leaving 30s for navigation and assertions. toPass loops with 60s budgets can now retry twice after health check failures.

Layer 6: Spec-Level API Resilience (`task-instances.spec.ts`)

The beforeAll setup makes direct API calls (POST dagRuns, GET taskInstances, PATCH state) that bypass the health check. These used the global 10s actionTimeout and had no health gating. Under server load, they time out.

Added waitForServerReady(page) before API calls
Increased beforeAll timeout to 120s (was inheriting 60s)
Added explicit 30s timeout to all 8 API calls (3x the actionTimeout)

Design Decisions

Decision	Choice	Rationale
Health check timeout	30s (reduced from 60s)	Must fit within default test timeout (60s) and `toPass` budgets (60s), leaving room for actual navigation and retries.
Default test timeout	60s (increased from 30s)	Tests average ~14s; 60s is generous without masking real failures. Prevents test death during health check polling.
Health check criteria	HTTP 200 only, no response time threshold	Firefox on CI regularly exceeds 2s for health responses. A time threshold makes the check unsatisfiable.
Pattern consolidation	Single `BasePage.safeGoto()` wrapper	Eliminates 12 scattered manual calls. Future page objects inherit protection automatically.
Method name	`safeGoto()` not `goto()`	`AssetDetailPage` has a public `goto()` that calls `navigateTo()`. If `navigateTo()` dispatched to `this.goto()`, polymorphism would resolve to the subclass method, creating infinite recursion.
Post-click timeouts	30s for first element after link navigation	Matches existing `handleWaitForMultipleOptionsTask` threshold. 10s is insufficient for Firefox/webkit under CI load.
Hover retry pattern	`toPass` loop with `force: true`	Webkit hover events are unreliable. Retrying the hover handles cases where the first hover didn't register.
Spec API timeouts	30s per request (3x actionTimeout)	Enough for a slow but responsive server. The health check gates the start, timeouts handle in-flight slowness.

Changes

New File

tests/e2e/utils/health.ts -- waitForServerReady(page) function with backoff polling (HTTP 200 check, 10s per-request timeout, 30s overall)

Modified Files (page objects)

tests/e2e/pages/BasePage.ts -- Added protected safeGoto() method; navigateTo() delegates to it
tests/e2e/pages/DagsPage.ts -- 3 direct page.goto() calls replaced with this.safeGoto()
tests/e2e/pages/RequiredActionsPage.ts -- 7 direct page.goto() calls replaced with this.safeGoto(); first element visibility timeouts after link clicks 10s→30s in 4 handle* methods; wait_for_default_option task timeout 30s→60s
tests/e2e/pages/DagCalendarTab.ts -- 1 direct page.goto() call replaced with this.safeGoto(); hover+tooltip wrapped in retry loop for webkit reliability
tests/e2e/pages/EventsPage.ts -- 1 direct page.goto() call replaced with this.safeGoto()
tests/e2e/pages/TaskInstancesPage.ts -- Table visibility timeout 10s→30s
tests/e2e/pages/ConnectionsPage.ts -- Combobox click timeout 3s→10s
tests/e2e/pages/DagCodePage.ts -- Inner Monaco editor visibility check 5s→10s per retry attempt
tests/e2e/pages/XComsPage.ts -- Table visibility timeout 10s→30s

Modified Files (config and specs)

playwright.config.ts -- Default test timeout 30s→60s
tests/e2e/specs/task-instances.spec.ts -- beforeAll: added waitForServerReady, 120s timeout, 30s per API call

All paths relative to airflow-core/src/airflow/ui/

Not Modified

BackfillPage.ts -- Already uses navigateTo() exclusively (only page.request.* for API calls)
LoginPage.ts -- Runs before health infra; already uses navigateTo()

Known Remaining Issues

Three categories of flakiness remain. All pass on retry (0 hard failures); none are addressable with further timeout tuning.

requiredAction.spec.ts (firefox/webkit) -- The HITL workflow (verifyFinalTaskStates → waitForTaskState) sometimes exceeds the test timeout under 2-worker parallel load. The Airflow scheduler is overwhelmed when heavy DAG operations overlap across workers. Passes on retry once the server recovers. The only fix is reducing to workers: 1 (doubling test time) or increasing server capacity.
Webkit click dispatch (xcoms.spec.ts) -- Buttons are found, visible, enabled, and stable, but click() hangs at the 10s actionTimeout. Affects expandAllButton and addFilterButton. Passes on retry, suggesting a transient webkit click interception issue (possibly an overlay or loading state). Increasing the global actionTimeout would affect all tests across all browsers.
dag-calendar-tab.spec.ts (webkit) -- "failed filter shows only failed runs" expects both success and failed runs in the calendar, but only sees success. This is a data timing issue: the failed DAG run created in beforeAll hasn't been reflected in the calendar data when the test reads tooltip states. Not a timeout issue — the test reads stale data, not missing UI elements.

How It Works

Test calls this.navigateTo("/dags")
  -> BasePage.navigateTo() calls this.safeGoto("/dags", { waitUntil: "domcontentloaded" })
    -> BasePage.safeGoto() calls waitForServerReady(this.page)
      -> GET /api/v2/monitor/health (10s timeout per request)
        -> 200? Return immediately (fast path, ~0ms overhead)
        -> Non-200 or timeout? Backoff [1s, 2s, 4s, 8s], retry up to 30s
        -> Still failing after 30s? Throw descriptive error
    -> this.page.goto(path, options)

Test calls this.safeGoto("/dags/my_dag/runs/abc123")  (subclass direct call)
  -> Same flow as above -- health check + goto

When the server is healthy (the normal case), waitForServerReady completes on the first attempt with negligible overhead. When the server is overloaded (the flaky CI case), the health check waits with backoff until the server recovers, then proceeds with navigation. The 30s health check fits within the 60s default test timeout, leaving room for the actual navigation and assertions. For toPass loops, the 30s health check leaves budget for at least one retry.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Generated-by: Claude Opus 4.6 following the guidelines

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

vatsrahul1001 · 2026-03-28T18:10:35Z

@Dev-iL This is great work for sure.

vatsrahul1001 · 2026-03-28T18:11:51Z

Hopefully all test goes to green

…lures Tests running with 2 parallel workers produce cascading failures when one heavy test (HITL, backfill) overwhelms the shared Airflow server, causing all concurrent navigations to timeout. Retries also fail because the server hasn't recovered. Add a health-check utility that polls /api/v2/monitor/health before every navigation, using backoff intervals [1s, 2s, 4s, 8s] with a 60s cap. When the server is responsive (the common case), the check returns immediately with negligible overhead. When overloaded, it waits for recovery instead of blindly navigating into timeouts. The health check is centralized in a new `BasePage.goto()` method that all page objects inherit. 12 direct `page.goto()` calls across DagsPage, RequiredActionsPage, DagCalendarTab, and EventsPage now use `this.goto()` instead. No spec files modified, no timeouts increased.

AssetDetailPage defines its own public goto() method that calls this.navigateTo(). When BasePage.navigateTo() dispatched to this.goto(), polymorphism resolved to AssetDetailPage.goto() instead of BasePage.goto(), creating infinite recursion. Rename to safeGoto() which doesn't collide with any subclass method names.

Remove the 2000ms response time threshold from health endpoint checks. The threshold was too strict for Firefox on CI, causing tests to timeout while waiting for server readiness even though the server was healthy (returning 200). The health check should verify the server responds, not enforce a specific response time. Increases REQUEST_TIMEOUT_MS to 10000ms for individual request attempts to give slower environments time to respond, while keeping the overall MAX_WAIT_MS at 60s. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

The health-aware safeGoto() protects against an unresponsive server, but two Firefox-only flaky failures occur after navigation succeeds: click-based navigation (RequiredActionsPage link clicks) and post-navigation React rendering (TaskInstancesPage table) can exceed 10s on Firefox under CI load. Increase these element visibility timeouts from 10s to 30s, matching the threshold already used by handleWaitForMultipleOptionsTask.

- ConnectionsPage: combobox click timeout 3s→10s (matching actionTimeout) - DagCalendarTab: wrap hover+tooltip in retry loop — webkit hover events are unreliable and may not trigger tooltips on first attempt - DagCodePage: inner Monaco editor visibility check 5s→10s per retry attempt, giving the editor more time to initialize on webkit - RequiredActionsPage: wait_for_default_option toPass timeout 30s→60s, too tight on webkit under 3-browser CI load - XComsPage: table visibility wait 10s→30s, same pattern as TaskInstancesPage fix

The 60s health check exceeded the 30s default test timeout, causing tests to die while the health check was still polling (plugins.spec.ts). It also consumed the entire toPass budget (triggerDag has toPass 60s), leaving no room for retries after the health check exhausted the timeout. - Reduce health check MAX_WAIT_MS from 60s to 30s - Increase default test timeout from 30s to 60s This ensures the health check fits within any test's time budget (30s < 60s), leaving 30s for navigation and assertions. Heavy tests with custom timeouts (120s+) are unaffected.

The beforeAll makes many direct API calls (POST dagRuns, GET taskInstances, PATCH state) that use the global 10s actionTimeout. Under CI load these time out. Also, the setup had no health check and inherited the 60s describe-level timeout for a multi-step setup. - Add waitForServerReady before API calls - Increase beforeAll timeout to 120s - Add explicit 30s timeout to all API calls (3x the actionTimeout)

choo121600 · 2026-03-30T04:31:25Z

It seems like the flaky issue might not be fully resolved yet 🥲

…pache#64366) * E2E: Add health-aware navigation to eliminate false-positive test failures Tests running with 2 parallel workers produce cascading failures when one heavy test (HITL, backfill) overwhelms the shared Airflow server, causing all concurrent navigations to timeout. Retries also fail because the server hasn't recovered. Add a health-check utility that polls /api/v2/monitor/health before every navigation, using backoff intervals [1s, 2s, 4s, 8s] with a 60s cap. When the server is responsive (the common case), the check returns immediately with negligible overhead. When overloaded, it waits for recovery instead of blindly navigating into timeouts. The health check is centralized in a new `BasePage.goto()` method that all page objects inherit. 12 direct `page.goto()` calls across DagsPage, RequiredActionsPage, DagCalendarTab, and EventsPage now use `this.goto()` instead. No spec files modified, no timeouts increased. * E2E: Rename BasePage.goto() to safeGoto() to fix stack overflow AssetDetailPage defines its own public goto() method that calls this.navigateTo(). When BasePage.navigateTo() dispatched to this.goto(), polymorphism resolved to AssetDetailPage.goto() instead of BasePage.goto(), creating infinite recursion. Rename to safeGoto() which doesn't collide with any subclass method names. * E2E: Relax health check to fix Firefox test timeouts Remove the 2000ms response time threshold from health endpoint checks. The threshold was too strict for Firefox on CI, causing tests to timeout while waiting for server readiness even though the server was healthy (returning 200). The health check should verify the server responds, not enforce a specific response time. Increases REQUEST_TIMEOUT_MS to 10000ms for individual request attempts to give slower environments time to respond, while keeping the overall MAX_WAIT_MS at 60s. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * E2E: Increase element visibility timeouts for Firefox CI flakiness The health-aware safeGoto() protects against an unresponsive server, but two Firefox-only flaky failures occur after navigation succeeds: click-based navigation (RequiredActionsPage link clicks) and post-navigation React rendering (TaskInstancesPage table) can exceed 10s on Firefox under CI load. Increase these element visibility timeouts from 10s to 30s, matching the threshold already used by handleWaitForMultipleOptionsTask. * E2E: Fix webkit-specific flakiness across 5 page objects - ConnectionsPage: combobox click timeout 3s→10s (matching actionTimeout) - DagCalendarTab: wrap hover+tooltip in retry loop — webkit hover events are unreliable and may not trigger tooltips on first attempt - DagCodePage: inner Monaco editor visibility check 5s→10s per retry attempt, giving the editor more time to initialize on webkit - RequiredActionsPage: wait_for_default_option toPass timeout 30s→60s, too tight on webkit under 3-browser CI load - XComsPage: table visibility wait 10s→30s, same pattern as TaskInstancesPage fix * E2E: Rebalance health check and test timeouts The 60s health check exceeded the 30s default test timeout, causing tests to die while the health check was still polling (plugins.spec.ts). It also consumed the entire toPass budget (triggerDag has toPass 60s), leaving no room for retries after the health check exhausted the timeout. - Reduce health check MAX_WAIT_MS from 60s to 30s - Increase default test timeout from 30s to 60s This ensures the health check fits within any test's time budget (30s < 60s), leaving 30s for navigation and assertions. Heavy tests with custom timeouts (120s+) are unaffected. * E2E: Robustify task-instances spec setup against server overload The beforeAll makes many direct API calls (POST dagRuns, GET taskInstances, PATCH state) that use the global 10s actionTimeout. Under CI load these time out. Also, the setup had no health check and inherited the 60s describe-level timeout for a multi-step setup. - Add waitForServerReady before API calls - Increase beforeAll timeout to 120s - Add explicit 30s timeout to all API calls (3x the actionTimeout) --------- Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

Dev-iL · 2026-03-30T04:39:01Z

It seems like the flaky issue might not be fully resolved yet 🥲

I know... But the question is - is it rarer? Also, one of the AI's suggestions was to use 1 concurrent worker instead of 2. This will make the tests run longer, but should be safer. Thoughts?

choo121600 · 2026-03-30T14:50:03Z

The main cause of the current test failures is related to data isolation and handling race conditions.
Additionally, issues arise during retries when timeouts occur due to resource constraints in the CI environment.
Previously, using networkidle masked these underlying issues, which is why they didn’t surface before.
Therefore, I believe it’s better to address the root cause rather than reducing the number of workers :)

boring-cyborg bot added the area:UI Related to UI/UX. For Frontend Developers. label Mar 28, 2026

Dev-iL force-pushed the 2603/e2e_deflake branch from decb6f4 to 2837758 Compare March 28, 2026 18:08

Dev-iL marked this pull request as ready for review March 28, 2026 18:08

Dev-iL requested review from choo121600 and vatsrahul1001 as code owners March 28, 2026 18:08

vatsrahul1001 approved these changes Mar 28, 2026

View reviewed changes

Dev-iL force-pushed the 2603/e2e_deflake branch from 2837758 to a5ce75c Compare March 28, 2026 19:32

Dev-iL and others added 7 commits March 29, 2026 08:57

Dev-iL force-pushed the 2603/e2e_deflake branch from a5ce75c to 018d96b Compare March 29, 2026 06:07

Dev-iL requested review from bbovenzi, guan404ming, pierrejeambrun, ryanahamilton and shubhamraj-git as code owners March 29, 2026 06:07

Dev-iL added the area:CI Airflow's tests and continious integration label Mar 29, 2026

vatsrahul1001 merged commit 28f7cf8 into apache:main Mar 29, 2026
84 checks passed

Dev-iL deleted the 2603/e2e_deflake branch March 29, 2026 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E: Add health-aware navigation to reduce false-positive failures#64366

E2E: Add health-aware navigation to reduce false-positive failures#64366
vatsrahul1001 merged 7 commits intoapache:mainfrom
Dev-iL:2603/e2e_deflake

Dev-iL commented Mar 28, 2026 •

edited

Loading

Uh oh!

vatsrahul1001 commented Mar 28, 2026

Uh oh!

vatsrahul1001 commented Mar 28, 2026

Uh oh!

Uh oh!

choo121600 commented Mar 30, 2026

Uh oh!

Dev-iL commented Mar 30, 2026

Uh oh!

choo121600 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dev-iL commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Observed Failure Pattern

Failure Analysis

Root Cause Analysis

Approach

Layer 1: Health Check Utility (tests/e2e/utils/health.ts)

Layer 2: Health-Aware BasePage Navigation (BasePage.safeGoto())

Layer 3: Consistent Element Visibility Timeouts

Layer 4: Webkit Interaction Reliability

Layer 5: Health Check / Test Timeout Rebalancing

Layer 6: Spec-Level API Resilience (task-instances.spec.ts)

Design Decisions

Changes

New File

Modified Files (page objects)

Modified Files (config and specs)

Not Modified

Known Remaining Issues

How It Works

Was generative AI tooling used to co-author this PR?

Uh oh!

vatsrahul1001 commented Mar 28, 2026

Uh oh!

vatsrahul1001 commented Mar 28, 2026

Uh oh!

Uh oh!

choo121600 commented Mar 30, 2026

Uh oh!

Dev-iL commented Mar 30, 2026

Uh oh!

choo121600 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dev-iL commented Mar 28, 2026 •

edited

Loading

Layer 1: Health Check Utility (`tests/e2e/utils/health.ts`)

Layer 2: Health-Aware BasePage Navigation (`BasePage.safeGoto()`)

Layer 6: Spec-Level API Resilience (`task-instances.spec.ts`)