Skip to content

Spatial phase I: Layer#370

Merged
teunbrand merged 34 commits intoposit-dev:mainfrom
teunbrand:spatial_start
May 7, 2026
Merged

Spatial phase I: Layer#370
teunbrand merged 34 commits intoposit-dev:mainfrom
teunbrand:spatial_start

Conversation

@teunbrand
Copy link
Copy Markdown
Collaborator

This PR advances #336.

It introduces the spatial layer with draws the simple feature geometry.
Internally, the geometry column is converted to WKB, which is handled via a dialect.
The vegalite writer converts the WKB to GeoJSON.
Like ggplot2, detection of the geometry column is automated for common cases.
This PR also adds the ggsql:world built-in dataset that contains the Natural Earth 110m dataset with selected columns.

Noteably absent in this PR is the whole projection ordeal.

teunbrand and others added 26 commits April 22, 2026 14:05
Registers GeomType::Spatial across the parser, AST, and builder.
The spatial geom requires a `geometry` aesthetic and supports
fill, stroke, opacity, linewidth, and linetype.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… SpatialRenderer

- Feature-gated `spatial` (default-on) with geozero + hex dependencies
- Auto-detect GEOMETRY columns via DESCRIBE for `DRAW spatial` layers
- SpatialRenderer converts WKB hex / GeoJSON strings to GeoJSON Features
- Vega-Lite geoshape mark with proper encoding channel handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip geometry aesthetic in encoding builder (structural, not visual)
- Remove SpatialRenderer::modify_encoding stub (default suffices)
- Remove geometry column auto-detection (revisit after Arrow migration)
- Add end-to-end tests: GeoJSON features, WKB hex, mixed layers
- Add execute test for explicit geometry mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Relax the grammar's other_sql_statement rule to accept any non-delimiter
tokens, so statements like INSTALL/LOAD/SET/ATTACH parse without error.
Execute these setup statements before the main query in the pipeline.
Flip DDL detection in DuckDB and SQLite readers to a returns_rows
whitelist, so unknown statement types are handled gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…query_arrow

DuckDB's query_arrow API exports results directly as Arrow RecordBatches,
eliminating the manual row-by-row type mapping. Extension types like
GEOMETRY now flow through as native Binary columns instead of hitting a
lossy string fallback. Decimal128 columns are normalized to Float64 at
the boundary for downstream compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… materialisation

Replaces the execute_sql() → register() two-step with direct
CREATE TEMP TABLE AS DDL. This keeps data inside the database engine
and preserves native types (e.g. DuckDB GEOMETRY) that were lost
during the Arrow materialisation round-trip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Y tests

Only apply ST_AsWKB when the geometry column is Binary (native DuckDB
GEOMETRY via Arrow). String columns (GeoJSON, WKB hex) pass through
to the writer unchanged.

Adds tests using INSTALL spatial; LOAD spatial with ST_GeomFromText
to verify the full native GEOMETRY pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove GeoJSON/WKB hex string handling from the writer and stat
transform. Spatial data should come from native GEOMETRY columns
(via ST_Read, ST_GeomFromText, etc.), not string columns.

Drops hex crate dependency from the spatial feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sql_geometry_to_wkb() to SqlDialect trait with ST_AsBinary
default (OGC standard). DuckDB overrides with ST_AsWKB. The spatial
stat transform uses the dialect instead of hardcoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Natural Earth 110m country boundaries as geoparquet. Columns: name,
iso_a3, continent, subregion, income_group, population, gdp, geom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spatial stat transform now calls dialect.sql_spatial_setup() before
emitting ST_AsWKB, so users no longer need manual LOAD spatial statements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a geom declares a geometry aesthetic and the user hasn't mapped it
explicitly, scan the schema for a column with a conventional geometry
name (geom, geometry, wkb_geometry, the_geom, shape) and binary type.
Requires exactly one match to avoid ambiguity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB reads geoparquet geometry columns as BLOB when the spatial
extension is not loaded, making ST_AsWKB fail later. Pre-load spatial
when the world dataset is referenced so the column is read as GEOMETRY.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teunbrand teunbrand marked this pull request as ready for review May 5, 2026 12:48
@teunbrand teunbrand requested a review from thomasp85 May 5, 2026 12:49
Copy link
Copy Markdown
Collaborator

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in general looks good. Most comments are cosmetic.

One thing we kinda ignore here, and maybe that is for the future projection PR, but do we need a geometry scale? Or do you scale latitude and longitude (defined by the projection), and the geometry mapping just sort of follows?

Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread src/execute/mod.rs
fn looks_like_geometry(name: &str) -> bool {
matches!(
name.to_lowercase().as_str(),
"geom" | "geometry" | "wkb_geometry" | "the_geom" | "shape"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these based on use in the wild? "the_geom" sounds made up 🫠

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 100% based on what claude decided might be reasonable. I didn't ask it for receipts, but it does seem to be out there in the wild:
https://gis.stackexchange.com/questions/49455/geometry-column-naming-convention-geom-or-the-geom

Comment thread src/execute/mod.rs Outdated
return Some(candidates[0].name.clone());
}

// Fall back to name-only match (e.g. extension types we don't recognise)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is a geometry type with another name? Shouldn't that win over the "standard" name?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.

The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first

Copy link
Copy Markdown
Collaborator Author

@teunbrand teunbrand May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the logic you propose itself is sound, but the problem with this is that every DB has its own interpretation of what constitutes 'valid geometry'. WKB is the 'standard' but DuckDB has its own flavour of geometry and I imagine different databases store it in different formats yet again (i.e. the 'extension types we don't recognise'). The homogenisation of data is the main reason we cast the geometry column as WKB.

I'm happy to remove this fallback though, it'll force the user to declare a geometry column if we can't guess it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we rely on the dialect to know what is "valid geometry"?

Copy link
Copy Markdown
Collaborator Author

@teunbrand teunbrand May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at 0f42b58 for how this could work. I personally think all the extra plumbing is not worth the slightly better heuristics and it should be reverted. Let me know if you think otherwise.

Comment thread src/reader/data.rs Outdated
Copy link
Copy Markdown
Collaborator

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a comment from Claude to consider:

GeoJSON reserved-key collisions in SpatialRenderer::prepare_data (src/writer/vegalite/layer.rs:2239):

properties.insert(col_name.clone(), value.clone()); 
feature.insert(col_name.clone(), value); 

Every non-geometry column is also splatted into the feature top level. If a user has a column named type, this overwrites "Feature" and Vega-Lite stops treating it as a feature. properties, bbox, id collide too. At minimum, skip those reserved keys when copying to the top level — or just drop the top-level copy (see next point).

Worth fixing while you're here

Properties stored twice in the inline data (same lines as above). Each non-geometry column ends up both at feature.X and feature.properties.X. Vega-Lite resolves bare field: "X" references against datum.X, which is the top-level form — so the properties map is unused output. Drop the properties map (simpler, matches how the rest of the codebase emits row data), or drop the top-level copy and use from: { data: ..., fields: ["properties.X"] } style. Either way, half the geometry-row JSON goes away.

Comment thread Cargo.toml Outdated
Comment thread src/reader/duckdb.rs
}

fn sql_spatial_setup(&self) -> Vec<String> {
vec!["LOAD spatial".into()]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if spatial is not installed? IS it installed on first load?

Copy link
Copy Markdown
Collaborator Author

@teunbrand teunbrand May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the user should install it. We ourselves only ever try loading it.
If it isn't installed the error message returned by the execution of LOAD spatial surfaces to the user.

The one exception is that we try to install it in our own tests, but these are all contingent of the spatial feature gate and we can't really run the tests without DuckDB's spatial extension.

Comment thread src/execute/mod.rs Outdated
return Some(candidates[0].name.clone());
}

// Fall back to name-only match (e.g. extension types we don't recognise)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.

The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first

Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Co-authored-by: Thomas Lin Pedersen <thomasp85@gmail.com>
@teunbrand
Copy link
Copy Markdown
Collaborator Author

teunbrand commented May 6, 2026

do we need a geometry scale

At some point maybe, but we'd have to have graticules first before it becomes meaningful.
It also isn't clear to me why we wouldn't just use SCALE lon ... and SCALE lat ... for this.
Having to invent how a multidimensional scale should act is a bigger challenge.

Also, a comment from Claude to consider:

Issue 1 about the reserved names is moot, because our naming scheme prevents collisions. I'm fine with being performatively defensive though.
Issue 2 about the properties is a good catch.

teunbrand and others added 6 commits May 6, 2026 11:07
Geometry columns lose their native type during Arrow materialisation
(DuckDB GEOMETRY becomes plain Binary). Add Reader::geometry_columns()
that queries the backend's type system (DESCRIBE for DuckDB) to find
actual geometry columns, with a name+type heuristic fallback.

The detection is implemented as GeomTrait::detect_aesthetics(), which
spatial overrides. This runs after global mapping merge so user-declared
geometry mappings take precedence. Multiple native geometry columns are
treated as ambiguous (user must declare explicitly).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teunbrand teunbrand merged commit 4712868 into posit-dev:main May 7, 2026
2 checks passed
@teunbrand teunbrand deleted the spatial_start branch May 7, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants