ci: agent-driven UI smoke test on Vercel preview#2238
Conversation
Adds a workflow that runs an agent (claude-code-action + Playwright MCP) against the Vercel preview build of any PR touching packages/app or packages/common-utils. The agent reads the "How to test on Vercel preview" section of the PR body and executes the listed routes and numbered steps verbatim, treating Verify/Confirm/Assert steps as assertions and posting a single ✅/❌ summary comment on the PR. Tightens the existing PR template's "How to test" section so authors write a parseable plan: an explicit "**Preview routes:**" line and a numbered "**Steps:**" list. PRs without a plan, or with the section left as the template placeholder, get a one-line skip comment from the agent rather than speculative testing. The preview is LOCAL_MODE with a pre-configured demo ClickHouse, so the agent does not need to register a user or add a connection — it just opens the listed routes and runs the author's steps. Run shape: ~30-90s, single PR comment, no failing status check (start observe-only; promote to required once false-positive rate is known).
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
🟢 Tier 1 — TrivialDocs, images, lock files, or a dependency bump. No functional code changes detected. Why this tier:
Review process: Auto-merge once CI passes. No human review required. Stats
|
PR Review
|
E2E Test Results✅ All tests passed • 166 passed • 3 skipped • 1213s
Tests ran across 4 shards in parallel. |
Deep ReviewScope: 2 files (192 lines): 🔴 P0/P1 -- must fix
🟡 P2 -- recommended
🔵 P3 nitpicks (10)
Reviewers (9): correctness, security, adversarial, reliability, maintainability, project-standards, testing, agent-native, learnings. Testing gaps:
|
Summary
Adds a workflow that runs an agent (
claude-code-action+ Playwright MCP) against the Vercel preview build of any PR touchingpackages/apporpackages/common-utils. The agent reads the "How to test on Vercel preview" section of the PR body, parses the listed routes and numbered steps, and executes them verbatim against the preview — treatingVerify/Confirm/Assertsteps as assertions and posting a single ✅ / ❌ summary comment.Tightens the existing PR template's test section into a parseable shape: an explicit
**Preview routes:**line and a numbered**Steps:**list. PRs with no plan, or that leave the section as the template placeholder, get a one-line skip comment instead of speculative testing.The preview is
LOCAL_MODEwith a pre-configured demo ClickHouse, so the agent doesn't need to register a user or add a connection — it just opens the listed routes and runs the author's steps.Why
Today the PR description tells humans what to verify, but nothing executes it. This pipes that same plan to an agent and surfaces results back as a comment. It's deliberately observe-only for now — no failing status check — so we can watch the false-positive rate before promoting it to a required check.
How it works
pull_request_targetfor paths underpackages/app/**orpackages/common-utils/**. Alsoworkflow_dispatchso we can manually re-run on a PR for testing.patrickedqvist/wait-for-vercel-preview.anthropics/claude-code-action@v1with the@playwright/mcpserver. Prompt instructs the agent to:/tmp/pr-body.md,### How to test on Vercel preview,**Preview routes:**and**Steps:**,Verify/Confirm/Assertas assertions,<!-- ui-preview-smoke -->so subsequent runs replace it.Test plan
This workflow can't smoke-test itself, so:
ANTHROPIC_API_KEYis set in repo secrets (already in use byclaude-code-review.yml).NEXT_PUBLIC_IS_LOCAL_MODE,NEXT_PUBLIC_HDX_LOCAL_DEFAULT_CONNECTIONS,NEXT_PUBLIC_HDX_LOCAL_DEFAULT_SOURCESare set on the Vercel Preview environment (not just Production).gh workflow run ui-preview-smoke.yml -f pr_number=<some-recent-ui-pr>to validate end-to-end without waiting for a new PR.How to test on Vercel preview
N/A — this PR only changes CI workflow + PR template.
References
claude-code-review.ymlaction setup (chore(ci): Add deep review on PR open/modify #2228, ci(deep-review): restructure review output for scannability #2230)