Skip to content

test(e2e): cover heatmap drag-select persistence (HDX-4147)#2209

Open
alex-fedotyev wants to merge 11 commits into
alex/HDX-4147-heatmap-select-persistfrom
alex/HDX-4147-heatmap-select-e2e
Open

test(e2e): cover heatmap drag-select persistence (HDX-4147)#2209
alex-fedotyev wants to merge 11 commits into
alex/HDX-4147-heatmap-select-persistfrom
alex/HDX-4147-heatmap-select-e2e

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

@alex-fedotyev alex-fedotyev commented May 6, 2026

Summary

Adds Playwright e2e regression coverage for HDX-4147 (the heatmap drag-select rectangle bug fixed in #2189). Stacked on alex/HDX-4147-heatmap-select-persist; rebases on main once that PR merges and the diff drops to just the test commit.

The bug was that the dashed selection rectangle on the Event Deltas heatmap collapsed to a 2x2 px residue after mouseup. There was no automated test exercising the drag-select interaction, which is why a wrong fix was mergeable. This test locks in the fix and catches any future regression on the same surface.

What it tests

One spec at packages/app/tests/e2e/features/search/event-deltas-heatmap.spec.ts, three test cases:

  1. Drag-select draws a persistent rectangle and writes URL state. Asserts URL gains xMin/xMax/yMin/yMax and .u-select has width AND height > 20 px (the 2x2 px collapse signature is well below this).
  2. Reloading the page restores the rectangle from URL state. Drag, capture rectangle dimensions, reload, assert the rectangle returns with the same dimensions. This is what specifically catches the ready-vs-onCreate timing issue in fix(heatmap): persist drag-select rectangle across re-renders #2189: at onCreate, u.scales.y.min/max aren't populated for mode-2 facet data, so the rectangle restore must run from uPlot's ready hook.
  3. Clicking off the rectangle clears both URL state and the rectangle. URL params drop, .u-select collapses below 5 px on each axis.

All assertions use expect.poll for URL and bounding-box reads. nuqs's setFields flushes the URL update asynchronously and uPlot's ready hook fires after first draw, so one-shot reads race with both.

Page object additions

SearchPage:

  • openEventDeltasMode(): switches to the Event Deltas analysis tab and waits for .uplot.
  • getHeatmap(), getHeatmapSelectionRect(): locators for the chart and its selection element.
  • dragHeatmapSelection(startX, startY, endX, endY): spans both axes so the resulting selection has non-zero width AND height (uPlot treats single-axis drags as zero-size and skips the hook).

Verification that the test catches the bug

I cherry-picked the two test commits onto plain main (which does NOT have #2189's fix) and ran CI on that branch (PR #2217, since closed). All three test cases failed, all retries (#1 and #2) also failed:

  • Test 1 failed at line 41 (the .u-select width poll).
  • Test 2 failed at line 61 (the .u-select width poll before reload).
  • Test 3 failed at line 119 (the click setup, because the persisted rectangle never existed to begin with).

CI run on bug-state code: https://github.com/hyperdxio/hyperdx/actions/runs/25450398876. CI run on fix-state code (this PR): https://github.com/hyperdxio/hyperdx/actions/runs/25449203405. Both runs ran the same spec; the only difference is whether selectionBounds plumbing and the ready-hook restore exist in DBHeatmapChart.tsx.

Test plan

  • yarn workspace @hyperdx/app run tsc --noEmit clean.
  • yarn workspace @hyperdx/app run ci:lint clean.
  • CI e2e shards 1-4 pass on this branch with the fix (ff2157d9).
  • CI e2e shard 3 fails on plain main without the fix; all three test cases fail with retries (verified via PR DO NOT MERGE: prove e2e tests fail without #2189 fix #2217, now closed).

Why this is a separate PR

#2189 is the fix; this is the regression guard. Splitting keeps each at a manageable size. Bundling the test on top of the fix would have pushed #2189's classifier from tier-3 to tier-4. Per notes/lessons/2026-05-06-pr-2189-verify-on-fresh-build.md, perf optimizations and bug fixes shouldn't ride together; tests for a bug also fit cleanly in their own PR.

The dashed selection rectangle on the Event Deltas heatmap collapses to
a 2x2 px box the instant the user releases the mouse. The filter is
applied correctly (URL params, comparison legend, bar charts all
update), but visually you can no longer see what region you picked.

Root cause: uplot-react's dataMatch deep-compares the data prop and
calls chart.setData(data, true) when refs differ. setData resets
uPlot's internal u.select rectangle. HeatmapContainer rebuilt the
[time, bucket, count] arrays in its render body, so on every parent
re-render those entries had new references with identical values, and
dataMatch returned false.

Memoize the data array with stable refs so dataMatch returns true,
setData is skipped, and u.select survives the re-render that follows
a drag (the URL param write triggers).

Also memoize generatedTsBuckets so its fresh-Date-array each render
doesn't propagate into the heatmapData useMemo deps.

HDX-4147
Lint feedback from CI:
- timestampColumn?.name dep is wrong since the closure uses the full
  timestampColumn object; pass the object instead. The ref is stable
  when data is stable, so this doesn't reintroduce the original bug.
- Stop destructuring bucket/count when only time is needed for the
  not-enough-data short-circuit.

Add changeset (patch on @hyperdx/app).
Bot review feedback: timestampColumn was a non-primitive useMemo dep
derived entirely from data?.meta. Move the call inside the useMemo so
data is the only relevant dep, which is the cleanest way to satisfy
both exhaustive-deps and the ref-stability the fix relies on.

Also drop the // x values / // y value series 1 / // y value series 2
labels — they describe what the line is rather than why.
prose-lint flags U+2014 anywhere; tighten the comment instead.
Drop the dataMatch description (it was element-wise === per series,
not top-level === as written) and consolidate to the actionable
mechanism: fresh data ref drives uplot-react to call setData(data, true)
which wipes u.select. Add HDX-4147 ticket reference.
The previous memoization fix was insufficient. The actual cause is that
uplot-react destroys and recreates the chart whenever its `options` prop
ref changes. `tickFormatter` was rebuilt on every render because
`numberFormat` is a fresh object reference from callers (e.g.
DBSearchHeatmapChart constructs `{output: 'duration', factor: 0.001}`
inline). Rebuilt tickFormatter cascaded into the `options` useMemo,
optionsUpdateState detected new top-level keys (series, axes, cursor,
plugins are new array/object literals on each recompute), classified the
change as 'create', and the chart was destroyed and recreated, wiping
`u.select`.

Two layers:

1. Stabilize `tickFormatter`'s dep on `numberFormat` content via
   JSON.stringify, so callers passing a fresh-each-render `numberFormat`
   no longer cause chart recreation.

2. Pass the URL-backed selection into `Heatmap` as a `selectionBounds`
   prop and reapply via `setSelect` on chart create + on the prop
   changing. This makes the URL the source of truth and survives any
   future cause of chart recreation (theme switch, resize bounce, etc.).
Page reload with a URL-encoded selection wasn't drawing the rectangle.
Root cause: at onCreate time, uPlot has constructed but not yet completed
its initial layout for mode-2 facet data. u.scales.y.min/max can still
be undefined, so applySelectionToChart's early-out triggered and setSelect
never got called.

Move the apply call to uPlot's `ready` hook, which fires once after the
initial draw completes. Hold selectionBounds and scaleType in refs so
the hook (captured inside the options useMemo) reads the latest values
without requiring memo dep churn.
Adds Playwright regression coverage for the bug fixed in PR #2189:
the dashed drag-select rectangle on the Event Deltas heatmap collapsed
to a 2x2 px residue after mouseup unless u.select is mirrored from the
URL state.

Three scenarios in one spec:

1. Drag-select draws a persistent rectangle and writes URL state.
   Asserts URL gains xMin/xMax/yMin/yMax AND the .u-select element
   has width/height > 20 px, well above the 2x2 px collapse signature.

2. Reloading the page restores the rectangle from URL state. This is
   what specifically catches the ready-vs-onCreate timing issue: at
   onCreate u.scales.y.min/max aren't populated for mode-2 facet data,
   so the apply must run from uPlot's ready hook. The test reloads
   after a drag and asserts the rectangle returns with the same
   dimensions.

3. Clicking off the rectangle clears both URL state and the
   rectangle (collapses below 5 px on each axis).

New SearchPage methods: openEventDeltasMode, getHeatmap,
getHeatmapSelectionRect, dragHeatmapSelection. The drag helper spans
both axes so the resulting selection has non-zero width AND height
(uPlot treats single-axis drags as zero-size and skips the hook).
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 6, 2026

🦋 Changeset detected

Latest commit: ff2157d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/app Patch
@hyperdx/api Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 6, 2026 4:59pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

{"review":"\n## PR Review\n\n✅ No critical issues found.\n\nTest-only PR adding Playwright regression coverage for HDX-4147. Page object additions follow existing patterns in SearchPage.ts, async state reads use expect.poll appropriately for nuqs/uPlot's deferred flushes, and the three test cases cover the bug's full surface (drag persistence, URL round-trip through the ready hook, and clear-on-click).\n\nMinor observations (non-blocking):\n- ⚠️ Selectors .uplot / .u-select depend on uPlot's internal class names — acceptable trade-off given they're stable, but worth noting if uPlot is upgraded.\n- ⚠️ JSDoc on the new page-object methods is more verbose than the project's general comment guidance prefers; fine for test scaffolding, trim if tightening.\n- ⚠️ Tests assume the seeded e2e dataset produces a non-empty Event Deltas heatmap on DEFAULT_TRACES_SOURCE_NAME; PR description already flags that local make dev-e2e couldn't be run end-to-end here, so CI is the source of truth.\n"}

@alex-fedotyev alex-fedotyev changed the base branch from alex/HDX-4147-heatmap-select-persist to main May 6, 2026 16:34
@github-actions github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

  • Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats
  • Production files changed: 0
  • Production lines changed: 0 (+ 216 in test files, excluded from tier calculation)
  • Branch: alex/HDX-4147-heatmap-select-e2e
  • Author: alex-fedotyev

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

E2E Test Results

All tests passed • 162 passed • 3 skipped • 1153s

Status Count
✅ Passed 162
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

nuqs setFields and uPlot's ready hook both flush asynchronously after
mouseup and after page reload. The synchronous reads at line 27 (URL
params) and line 35/64 (post-reload boundingBox) raced with the flush
and failed in CI shard 2 across all 3 retries: the URL never carried
xMin= within the test window.

Wrap each URL read in expect.poll(...) per param and gate the rect
dimension reads on a polled width > 20 (or < 5 for the post-clear
collapse). Capture rect once after polling settles, then assert the
remaining dimensions and the cross-check round-trip.

Also fixes lint: simple-import-sort/imports, prettier multi-line
arg formatting on assertion messages.
@alex-fedotyev alex-fedotyev changed the base branch from main to alex/HDX-4147-heatmap-select-persist May 6, 2026 16:56
alex-fedotyev added a commit that referenced this pull request May 6, 2026
The three failing dashboard-container tests in CI shard 1 came down to
two distinct races:

1. Tests #3 (line 112) and #5 (line 194) read getActiveTabsParam()[id]
   synchronously after Add Tab and after tab switches. nuqs flushes URL
   state asynchronously, so the read fires before the param is written.
   Wrap each sync read in expect.poll, mirroring the fix in PR #2209.

2. Test #6 (line 231) paired toggleGroupBordered(idA) with
   addTabToGroup(idB) back-to-back. Both setDashboard calls produce()
   from the same pre-mutation snapshot of the React Query cache, so
   the second PATCH overwrites the first; the toggle is silently
   dropped. The save-and-reload assertion then sees the wrong state
   (or, when goto fires before the PATCH lands, no state at all).

   Narrow the round-trip to a single mutation per step (addTabToGroup
   on group B), wait for networkidle before navigating away, capture
   the dashboard id from the URL while we are still on the page, and
   use expect.poll for the post-reload getGroupOrder assertion. The
   bordered toggle stays covered by the dedicated in-page test at
   line 78. The underlying back-pressure race is tracked separately
   in #2216.
@alex-fedotyev alex-fedotyev force-pushed the alex/HDX-4147-heatmap-select-persist branch from 287183d to 41b87cc Compare May 8, 2026 00:18
alex-fedotyev added a commit that referenced this pull request May 8, 2026
The three failing dashboard-container tests in CI shard 1 came down to
two distinct races:

1. Tests #3 (line 112) and #5 (line 194) read getActiveTabsParam()[id]
   synchronously after Add Tab and after tab switches. nuqs flushes URL
   state asynchronously, so the read fires before the param is written.
   Wrap each sync read in expect.poll, mirroring the fix in PR #2209.

2. Test #6 (line 231) paired toggleGroupBordered(idA) with
   addTabToGroup(idB) back-to-back. Both setDashboard calls produce()
   from the same pre-mutation snapshot of the React Query cache, so
   the second PATCH overwrites the first; the toggle is silently
   dropped. The save-and-reload assertion then sees the wrong state
   (or, when goto fires before the PATCH lands, no state at all).

   Narrow the round-trip to a single mutation per step (addTabToGroup
   on group B), wait for networkidle before navigating away, capture
   the dashboard id from the URL while we are still on the page, and
   use expect.poll for the post-reload getGroupOrder assertion. The
   bordered toggle stays covered by the dedicated in-page test at
   line 78. The underlying back-pressure race is tracked separately
   in #2216.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-2 Low risk — AI review + quick human skim

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant