Skip to content

fix(validate): case-insensitive aesthetic-clause column lookup#444

Closed
dataders wants to merge 0 commit intoposit-dev:mainfrom
dataders:case-insensitive-aesthetic-binding
Closed

fix(validate): case-insensitive aesthetic-clause column lookup#444
dataders wants to merge 0 commit intoposit-dev:mainfrom
dataders:case-insensitive-aesthetic-binding

Conversation

@dataders
Copy link
Copy Markdown

@dataders dataders commented May 8, 2026

Summary

VISUALISE clauses in .gg.sql files currently fail validation against case-preserving warehouses (Snowflake, Databricks, BigQuery quoted IDs) when the user writes lowercase column references. The columns come back from the warehouse in their declared case (typically UPPERCASE), and ggsql's column-name resolver does exact-case comparison, producing:

Validation error: Column 'table_name' referenced in aesthetic 'pos1' does not exist.
Available columns: TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, ROW_COUNT

DuckDB folds identifiers to lowercase by default, so the bug only bites case-preserving sources.

Fix

Case-insensitive column resolution with an exact-match-first preference:

  1. execute::canonicalize_column_references runs after the source schema is fetched and rewrites layer mappings, remappings, and partition_by columns to the schema's canonical case. This is what actually unblocks the SQL — without canonicalization, the generated SQL emits \"table_name\" AS \"__ggsql_aes_pos1__\" which a case-preserving warehouse rejects because the real column is TABLE_NAME. The user's original casing is preserved on AestheticValue::Column.original_name so axis labels render the user's intent.

  2. writer::vegalite::layer::validate_layer_columns also uses a tolerant case-insensitive lookup as defense in depth, in case any code path slips past canonicalization.

Resolution rules in both layers:

  • Exact-case match wins outright (no behavior change for DuckDB / lowercase schemas).
  • Otherwise, case-insensitive lookup; a single match is accepted, multiple matches raise a clear ambiguity error so we never silently guess.
  • No match → the existing "column does not exist" error is preserved.

Evidence

This bug was independently reproduced from two different reader stacks against real Snowflake — confirming the root cause is ggsql-internal, not reader-side conversion:

  • XdbcReader (fusion-stack via dbt-xdbc): hit the same Validation error: Column 'table_name' … against analytics_dev.information_schema.tables.
  • AdbcReader (clean-room Apache ADBC, no fusion dependencies): hit identical symptom from the same query.

Both readers correctly preserve the warehouse's casing; the validator was the only mismatched component.

Test plan

  • New unit test test_case_insensitive_column_lookup — DataFrame columns TABLE_NAME, ROW_COUNT; aesthetics reference table_name (lowercase) and ROW_COUNT (exact case) — write succeeds.
  • New unit test test_case_insensitive_lookup_ambiguity — DataFrame has both Foo and FOO; aesthetic references foo — error includes "ambiguous".
  • Existing test_missing_column_error still passes (no regression to genuinely-missing-column path).
  • Full cargo test --lib green: 1365 passed, 0 failed, 1 ignored.
  • Manual verification against a Snowflake-backed .gg.sql (left to maintainers — needs warehouse credentials).

🤖 Generated with Claude Code

@dataders dataders closed this May 9, 2026
@dataders dataders force-pushed the case-insensitive-aesthetic-binding branch from ec87290 to ac78cd4 Compare May 9, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant