Skip to content

OperatorStateMetadata with HDFSMetadataLog#10

Closed
ericm-db wants to merge 9 commits into
stack/state-metadata-changesfrom
stack/operator-state-metadata-v2
Closed

OperatorStateMetadata with HDFSMetadataLog#10
ericm-db wants to merge 9 commits into
stack/state-metadata-changesfrom
stack/operator-state-metadata-v2

Conversation

@ericm-db
Copy link
Copy Markdown
Owner

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@ericm-db ericm-db closed this Jul 23, 2024
ericm-db pushed a commit that referenced this pull request May 5, 2026
### What changes were proposed in this pull request?

Address the open follow-ups from [SPARK-56681](https://issues.apache.org/jira/browse/SPARK-56681) (umbrella for PATH / SPARK-56605 cleanup) in a single cleanup PR. Items #1 and #2 were already wired by SPARK-56639; this PR covers the remainder.

| # | Item | Resolution |
|---|---|---|
| #1 | `FunctionResolution.resolveProcedure` was dead code | Already wired by SPARK-56639 (no action). |
| #2 | Frozen view / SQL-function PATH wiring unfinished | Already done by SPARK-56639 (no action). |
| #3 | `AnalysisContext.resolutionPathEntries` threadlocal | Audit only: confirmed `withNewAnalysisContext` / `reset()` correctly clear it. Full removal needs a coordinated refactor to plumb the path through `RelationResolution` / `FunctionResolution` method calls; flagged as a follow-up. |
| #4 | `Analyzer.executeAndCheck` clobbers outer `SQLConf.withExistingConf` | Extracted `runWithSessionConf` helper, added `SQLConf.getExistingConfIfSet`. `executeAndCheck` and `executeSameContext` now share one path that yields to any outer scope. |
| #5 | `VariableResolution.allowUnqualifiedSessionTempVariableLookup` force-loads default catalog | Replaced the hot-path catalog read with `CatalogManager.isSystemSessionOnPath`, which inspects stored session-path entries directly. No catalog load on column resolution. |
| #6 | `DROP VARIABLE` PATH gate asymmetric with `DECLARE` / `CREATE` | Removed the gate. DDL on session variables (`DECLARE` / `CREATE` / `DROP`) always targets `system.session` directly; only DML (`SET VAR`, `SELECT x`) goes through PATH. |
| #7 | `lookupFunctionType` exception swallow too broad | Narrowed from `NonFatal` to the explicit not-found list (`NoSuchFunctionException`, `NoSuchNamespaceException`, `CatalogNotFoundException`, `FORBIDDEN_OPERATION`). Other exceptions propagate. |
| #8 | `lookupFunctionType` fan-out had wasteful `system.*` candidates | Filtered them out — `system.session`, `system.builtin`, `system.ai` are already resolved earlier in the same method. |
| #9 | Three near-duplicate path-resolution helpers | Lifted into `CatalogManager.resolutionPathEntriesForAnalysis(pinnedEntries, viewCatalogAndNamespace)`. Relation, routine, and procedure resolution all route through it. |
| #10 | Tests for the new error paths and gates | Added a DECLARE / SET VAR / DROP cycle test under non-default PATH and a struct-variable field-vs-qualified ambiguity test in `sql-session-variables.sql`. |
| #11 | `ProtoToParsedPlanTestSuite.analyzerIsolationConf` was a bare `SQLConf` | Clone `spark.sessionState.conf` and only override `PATH_ENABLED=false`, so all `sparkConf` overrides (ANSI, alias config, ...) propagate automatically. |
| Bonus | `ResolveSetVariable` hardcoded `SYSTEM.SESSION` regardless of actual PATH | `unresolvedVariableError` now takes `Seq[Seq[String]]` path entries with **required** `Origin` (no overloads). DML lookup failures (`SET VAR`, `FETCH ... INTO`) report the full SQL path as a bracketed list, byte-for-byte consistent with `UNRESOLVED_ROUTINE` and `TABLE_OR_VIEW_NOT_FOUND`. DDL name validation in `ResolveCatalogs` continues to report `[system.session]` since PATH does not apply there. Origin is plumbed through `VariableManager.set` so all error sites carry a `queryContext` pointing at the offending variable identifier (parser opt-ins via `withOrigin(identifierReference)` so the highlight is the variable name, not the whole statement). |

### Why are the changes needed?

These are the cleanup items called out on SPARK-56681 from the post-merge source review of SPARK-56605. They eliminate dead code paths, plug user-visible bugs (force-loading a misconfigured default catalog on column resolution; clobbering pinned session configs; swallowing real catalog errors as `UNRESOLVED_ROUTINE`), remove the asymmetry between DDL and DML on session variables, and make `UNRESOLVED_VARIABLE` self-consistent with the other "not found" errors.

### Does this PR introduce _any_ user-facing change?

Yes.

- **`UNRESOLVED_VARIABLE.searchPath`** is now rendered as a bracketed list. For DML lookups (`SET VAR`, `FETCH ... INTO`), the list reflects the actual SQL PATH that was consulted instead of a hardcoded `SYSTEM.SESSION`. For DDL name validation (`DECLARE` / `DROP` with a non-session namespace), the list is `[`` `system`.`session` ``]` since PATH does not apply.
- **`UNRESOLVED_VARIABLE`** now always carries a `queryContext` that highlights just the offending variable identifier (e.g. `"builtin.var1"`, `"ses.var1"`), not the whole `DECLARE` / `SET VAR` statement.
- **`DROP TEMPORARY VARIABLE`** no longer raises `UNRESOLVED_VARIABLE` when the SQL PATH does not contain `system.session`. DDL on session variables ignores PATH, matching the existing behaviour of `DECLARE OR REPLACE VARIABLE`.
- **`lookupFunctionType`** no longer swallows non–`NotFound` errors. A catalog reporting `PERMISSION_DENIED` (or similar) for a function lookup now propagates instead of silently producing `UNRESOLVED_ROUTINE`.

### How was this patch tested?

- Added `sql-session-variables.sql` regression test for the struct-variable field-vs-qualified ambiguity (`DECLARE VARIABLE session STRUCT<a INT>` → `SELECT session.a` succeeds → `DROP` → `SELECT session.a` falls through to `UNRESOLVED_COLUMN`).
- Updated `SetPathSuite`: DECLARE / SET VAR / DROP cycle under a non-default PATH; bonus test asserts the actual rendered search path and the variable-identifier `queryContext`.
- Updated `SqlScriptingExecutionSuite` for the new bracketed `searchPath` and identifier-pinned `queryContext`.
- Regenerated `sql-session-variables.sql.out` for the new error shape.
- Added `resolutionPathEntriesForAnalysis` stubs to mocked `CatalogManager` instances in `PlanResolutionSuite`, `AlignAssignmentsSuiteBase`, and `TableLookupCacheSuite`.
- Ran focused suites locally; all pass:
  - `build/sbt 'sql/testOnly *SetPathSuite *SqlScriptingExecutionSuite *ExecuteImmediateEndToEndSuite'`
  - `build/sbt 'sql/testOnly *SimpleSQLViewSuite *SQLFunctionSuite'`
  - `build/sbt 'sql/testOnly *PlanResolutionSuite *UpdateTableAlignAssignmentsSuite *MergeIntoTableAlignAssignmentsSuite'`
  - `build/sbt 'catalyst/testOnly *TableLookupCacheSuite *AnalysisSuite *AnalysisErrorSuite *LookupFunctionsSuite'`
  - `build/sbt 'sql/testOnly *FunctionQualificationSuite *RelationQualificationSuite *DataSourceV2FunctionSuite'`
  - `build/sbt 'sql/testOnly *SQLQuerySuite'`
  - `build/sbt 'connect/testOnly *ProtoToParsedPlanTestSuite'`
  - `build/sbt 'sql/testOnly *SQLQueryTestSuite -- -z sql-session-variables.sql'`
  - Full `org.apache.spark.sql.catalyst.analysis.*`, `org.apache.spark.sql.catalyst.parser.*`, and `org.apache.spark.sql.analysis.resolver.*` suites.
- `scalastyle` and `scalafmt` clean across catalyst, sql, and connect modules.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor Claude Opus 4.7

Closes apache#55647 from srielau/SPARK-56681-patch-clean-up.

Authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant