Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a12e198
first pass
thomasp85 Apr 27, 2026
bb16c9c
Merged upstream/main into issue160-aggregate
thomasp85 Apr 27, 2026
778b6ac
support numeric axis geoms
thomasp85 Apr 27, 2026
0a1b214
support range geoms
thomasp85 Apr 27, 2026
218f302
reformat
thomasp85 Apr 27, 2026
f14a017
Merge commit 'c3e234b942094f05ddefac1ae6d9b407c54771c3'
thomasp85 Apr 28, 2026
8c5845f
support aggregation in segment
thomasp85 Apr 28, 2026
2cb0216
allow orientation in range and ribbon for aggregation case
thomasp85 Apr 28, 2026
cc390bd
rename to percentile
thomasp85 Apr 28, 2026
4476005
make aggregates parametric
thomasp85 Apr 28, 2026
3f1a433
reformat
thomasp85 Apr 28, 2026
6147ccc
clippy be happy
thomasp85 Apr 28, 2026
1c613e4
ensure multiple aggregates give rise to multiple groups
thomasp85 Apr 28, 2026
f3081a3
begin to document
thomasp85 Apr 28, 2026
56780b0
polygon and path doesn't allow aggregation
thomasp85 Apr 28, 2026
802f1f1
Add documentation for non-range layers
thomasp85 Apr 28, 2026
c6dd4a9
rethink aggregation
thomasp85 May 4, 2026
caf0a8e
add back long-form aggregation
thomasp85 May 4, 2026
564673c
reformat
thomasp85 May 4, 2026
88a707b
fix aggregation of time-dependent layers
thomasp85 May 4, 2026
b1938d8
add additional aggregations + examples
thomasp85 May 6, 2026
c40ea31
Apply suggestions from code review
thomasp85 May 6, 2026
d76825d
apply doc changes to all layers
thomasp85 May 6, 2026
bcbedba
support first and last in ANSI, add diff
thomasp85 May 7, 2026
905063d
support tile
thomasp85 May 7, 2026
840fc6e
defer scaling of aggregated columns
thomasp85 May 7, 2026
bdcd700
update SKILL
thomasp85 May 7, 2026
65c504c
reformat
thomasp85 May 7, 2026
a2b24b9
Merge aggregate_domain_aesthetics and supports_aggregate into one
thomasp85 May 7, 2026
7014e93
avoid twice parsing
thomasp85 May 7, 2026
5ce2ddc
refactor aggregate parsing
thomasp85 May 7, 2026
4aa4159
better warning
thomasp85 May 7, 2026
95d4df1
add finer test
thomasp85 May 7, 2026
134f847
Merge commit '23c50f1a67872808838933f2a7a287871e82c446'
thomasp85 May 7, 2026
ee49998
appease our dear lord and master clippy
thomasp85 May 7, 2026
c574e45
Apply suggestions from code review
thomasp85 May 7, 2026
fe8fae0
improve docs
thomasp85 May 7, 2026
ba0bf3f
implement suggestions from review
thomasp85 May 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ criterion/

# Claude Code specific
.claude/
memory

# R specific
*.Rproj.user
Expand Down
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

### Added

- New `aggregate` SETTING on Identity-stat layers (point, line, area, bar, ribbon,
range, segment, arrow, rule, text). By default it collapses each group to a
single row by replacing every numeric mapping in place with its aggregated
value. See the `DRAW` documentation for details.
- Added panel decorations (grid lines, axes, background) for polar coordinates (#156).
- Added `radar` setting to polar coordinates for making radar plots (#418).

Expand All @@ -11,7 +15,7 @@

- Side effects like `CREATE TEMP TABLE` before the `VISUALISE` statement are now
separated from directly feeding into the visualisation data (#415)
- Fixed bug where panel axes were unintentionally anchored to zero when using
- Fixed bug where panel axes were unintentionally anchored to zero when using
`FACET ... SETTING free => 'x'/'y'` (#410).
- Fixed bug where faceted data were matched to the incorrect panels (#409)

Expand Down
42 changes: 42 additions & 0 deletions doc/syntax/clause/draw.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,48 @@ The `SETTING` clause can be used for two different things:
#### Position
A special setting is `position` which controls how overlapping objects are repositioned to avoid overlapping etc. Position adjustments have special mapping requirements so all position adjustments will not be relevant for all layer types. Different layers have different defaults as detailed in their documentation. You can read about each different position adjustment at [their own documentation sites](../index.qmd#position-adjustments).

#### Aggregate
Comment thread
thomasp85 marked this conversation as resolved.
Some layers support aggregation of their data through the `aggregate` setting. Their documentation will state this. `aggregate` collapses each group to a single row, replacing every numeric mapping in place with its aggregated value. Groups are defined by `PARTITION BY` together with all discrete mappings.

The `aggregate` setting takes a single string or an array of strings. Each string is one of:

* **Untargeted** — `'<func>'` (no prefix). With one untargeted aggregation, the function applies to every numeric mapping that doesn't have a targeted aggregation. With two untargeted aggregations, the first is used for the lower side of range layers (e.g. `x`/`xmin`) plus all non-range layers, and the second is used for the upper side of range layers (e.g. `xend`/`xmax`). More than two untargeted aggregations is not allowed.
* **Targeted** — `'<aes>:<func>'`. Applies `func` to the named aesthetic only (`<aes>` is a name like `x`, `y`, `xmin`, `xmax`, `xend`, `yend`, `color`, `size`, …). A target overrides any untargeted aggregation for that aesthetic.

A numeric mapping is dropped from the layer with a warning, when it has neither a target nor an applicable default.

##### Aggregate functions
Aggregation can either be a simple function or a band function. The simple functions are:

* `'count'`: Non-null tally of the bound column.
* `'sum'` and `'prod'`: The sum or product
* `'min'`, `'max'`: Extremes
* `'range'` (max - min), `'mid'` (min + max) / 2
* `'mean'`, and `'median'`: Central tendency
* `'geomean'`, `'harmean'`, and `'rms'`: Geometric, harmonic, and root-mean-square
* `'sdev'`, `'var'`, `'iqr'`, and `'se'`: Standard deviation, variance, interquartile range, and standard error
* `'p05'`, `'p10'`, `'p25'`, `'p50'`, `'p75'`, `'p90'`, and `'p95'`: Percentiles
* `'first'` and `'last'`: The first or last value in the group, in row order. Note that the row order within a group is engine-defined unless the source query has an `ORDER BY` — these are most useful when the upstream SQL provides an explicit ordering.
* `'diff'`: `last - first`. The change between the first and last value in row order — same ordering caveat applies.

For band functions you combine an offset with an expansion, potentially multiplied. An example could be `'mean-1.96sdev'` which does exactly what you'd expect it to be. The general form is `<offset>±<multiplier><expansion>` with `<multiplier>` being optional (defaults to `1`).

Allowed offsets are: `'mean'`, `'median'`, `'geomean'`, `'harmean'`, `'rms'`, `'sum'`, `'prod'`, `'min'`, `'max'`, `'mid'`, and `'p05'`–`'p95'`

Allowed expansions are: `'sdev'`, `'se'`, `'var'`, `'iqr'`, and `'range'`
Comment on lines +105 to +107
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For overview purposes a table could be nice, but it is not at all necessary

| function             | simple | offset | expansion | description                                          |
|----------------------|--------|--------|-----------|------------------------------------------------------|
| `'mean'`, `'median'` | v      | v      | x         | Central tendency.                                    |
| `'sdev'` , `'var'`   | v      | x      | v         | Standard deviation, variance                         |
| `'first'`,`'last'`   | v      | x      | x         | The first or last value in the group ^[**footnote**] |


##### Exploded aggregation
You can also target the same aesthetic more than once to produce *multiple rows per group* — one for each function. We call that *exploded aggregation*. For example `aggregate => ('y:min', 'y:max')` emits a min row and a max row per group, so a single `DRAW line` produces two summary lines that connect within each group rather than across them. When multiple rows are created, a synthetic `aggregate` column is made that tags each row with the name of the aggregation function. You can use this with a `REMAPPING` to drive another aesthetic — e.g. `REMAPPING aggregate AS stroke` to colour the two lines differently. The column's value is built from the per-row function names of the *exploded* targets, deduplicated, and joined with `/`:

* `aggregate => ('y:min', 'y:max')` → rows tagged `'min'`, `'max'`.
* `aggregate => ('y:min', 'y:max', 'color:median')` → rows tagged `'min'`, `'max'` (the single-function `color` target is recycled across rows and is not part of the label).
* `aggregate => ('y:min', 'y:max', 'color:sum', 'color:prod')` → rows tagged `'min/sum'`, `'max/prod'`.
* `aggregate => ('y:mean', 'y:max', 'color:mean', 'color:prod')` → rows tagged `'mean'`, `'max/prod'` (the duplicate `'mean'` collapses).

When several aesthetics are targeted with the same number of functions, they explode in lockstep: row 1 uses each aesthetic's first function, row 2 the second, and so on. Aesthetics with a single function — and the unprefixed defaults — are reused unchanged across every row. Mixing different numbers of aggregation metrics above 1 across aesthetics is not allowed.

In the single-row (reduction) case aggregation applies in place — no `REMAPPING` is needed and no synthetic column is added. Only the multi-row (explosion) case described above introduces the synthetic `aggregate` column.

### `FILTER`
```ggsql
FILTER <condition>
Expand Down
7 changes: 6 additions & 1 deletion doc/syntax/layer/type/area.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,14 @@ The following aesthetics are recognised by the area layer.
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The area layer sorts the data along its primary axis
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY`, all discrete mappings, but also the primary axis. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

Further, the area layer sorts the data along its primary axis before returning it.

## Orientation
Area plots are sorted and connected along their primary axis. Since the primary axis cannot be deduced from the mapping it must be specified using the `orientation` setting. E.g. if you wish to create a vertical area plot you need to set `orientation => 'transposed'` to indicate that the primary layer axis follows the second axis of the coordinate system.
Expand Down
17 changes: 17 additions & 0 deletions doc/syntax/layer/type/bar.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,15 @@ The bar layer has no required aesthetics
## Settings
* `position`: Position adjustment. One of `'identity'`, `'stack'` (default), `'dodge'`, or `'jitter'`
* `width`: The width of the bars as a proportion of the available width (0 to 1)
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
If the secondary axis has not been mapped the layer will calculate counts for you and display these as the secondary axis.

This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY` and all discrete mappings. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

### Properties

* `weight`: If mapped, the sum of the weights within each group is calculated instead of the count in each group
Expand Down Expand Up @@ -116,3 +121,15 @@ DRAW bar
MAPPING species AS fill
PROJECT TO polar
```

Use a different type of aggregation for the bars through the `aggregate` setting. The `range` layer needs both `ymin` and `ymax` mapped; with two defaults, the first is applied to the lower bound and the second to the upper bound.
Comment thread
teunbrand marked this conversation as resolved.

```{ggsql}
VISUALISE species AS x FROM ggsql:penguins
DRAW bar
MAPPING body_mass AS y
SETTING aggregate => 'mean', fill => 'steelblue'
DRAW range
MAPPING body_mass AS ymin, body_mass AS ymax
SETTING aggregate => ('mean-1.96sdev', 'mean+1.96sdev')
```
20 changes: 18 additions & 2 deletions doc/syntax/layer/type/line.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,15 @@ The following aesthetics are recognised by the line layer.
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The line layer sorts the data along its primary axis.
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY`, all discrete mappings, but also the primary axis. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

Further, the line layer sorts the data along its primary axis before returning it.

If the line has a variable `stroke` or `opacity` aesthetic within groups, the line is broken into segments.
Each segment gets the property of the preceding datapoint, so the last datapoint in a group does not transfer these properties.

Expand Down Expand Up @@ -89,4 +95,14 @@ VISUALISE x, y FROM data
DRAW line
MAPPING z AS linewidth
SCALE linewidth TO (0, 30)
```
```

Use aggregation to draw min and max lines from a set of observations on a single layer. Targeting `y` twice produces one summary row per function within the same group. A synthetic `aggregate` column tags each row with the different function names, that you can remap to colour the lines distinctly:

```{ggsql}
VISUALISE Day AS x, Temp AS y FROM ggsql:airquality
DRAW line
REMAPPING aggregate AS stroke
SETTING aggregate => ('y:min', 'y:max')
DRAW point
```
15 changes: 14 additions & 1 deletion doc/syntax/layer/type/point.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,12 @@ The following aesthetics are recognised by the point layer.

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The point layer does not transform its data but passes it through unchanged
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY` and all discrete mappings. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
The point layer has no orientation. The axes are treated symmetrically.
Expand Down Expand Up @@ -72,3 +75,13 @@ VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
DRAW point
SETTING position => 'jitter', distribution => 'density'
```

Use aggregation to show a single point per group

```{ggsql}
VISUALISE species AS x, island AS y, body_mass AS fill, body_mass AS size
FROM ggsql:penguins
DRAW point
SETTING aggregate => ('fill:mean', 'size:count')
SCALE size TO (5, 20)
```
27 changes: 26 additions & 1 deletion doc/syntax/layer/type/range.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,12 @@ The following aesthetics are recognised by the range layer.

## Settings
* `width`: The width of the hinges in points (must be >= 0). Defaults to 10. Can be set to `null` to not display hinges.
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The range layer does not transform its data but passes it through unchanged.
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one range per group. Range is a range layer with two defaults: the first applies to the start point (`xmin`/`ymin`) and the second applies to the end point (`xmax`/`ymax`). Use a single default like `'mean'` to apply the same function to all values, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
The orientation of range layers is deduced directly from the mapping, because the interval is mapped to the secondary axis. To create a horizontal range layer, you map the independent variable to `y` instead of `x` and the interval to `xmin` and `xmax` (assuming a default Cartesian coordinate system).
Expand Down Expand Up @@ -108,3 +111,25 @@ DRAW range
MAPPING low AS ymin, high AS ymax
SETTING width => null
```

Rather than precomputing the values and plotting them, you can use the aggregate functionality to calculate the relevant statistics dynamically:

```{ggsql}
VISUALISE Date AS x, Temp AS ymin, Temp AS ymax, Temp AS color
FROM ggsql:airquality
DRAW range
REMAPPING aggregate AS linewidth
SETTING
aggregate => (
'x:first',
'ymin:first', 'ymin:min',
'ymax:last', 'ymax:max',
'color:diff'
),
width => null
PARTITION BY Week
SCALE linewidth TO (5, 1)
SCALE BINNED color TO ('steelblue', 'firebrick')
SETTING breaks => (-20, 0, 20)
```

13 changes: 12 additions & 1 deletion doc/syntax/layer/type/ribbon.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,12 @@ The following aesthetics are recognised by the ribbon layer.

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The ribbon layer sorts the data along its primary axis
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one ribbon per group. Ribon is a range layer with two defaults: the first applies to the start point (`xmin`/`ymin`) and the second applies to the end point (`xmax`/`ymax`). Use a single default like `'mean'` to apply the same function to all values, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
Ribbon layers are sorted and connected along their primary axis. The orientation is deduced directly from the mapping, because the interval is mapped to the secondary axis. To create a vertical ribbon layer you map the independent variable to `y` instead of `x` and the interval to `xmin` and `xmax` (assuming a default Cartesian coordinate system).
Expand Down Expand Up @@ -59,3 +62,11 @@ DRAW ribbon
DRAW line
MAPPING MeanTemp AS y
```

Use aggregation to calculate bounds on the fly. The two untargeted aggregation functions target the `ymin` and `ymax` aesthetics automatically.

```{ggsql}
VISUALISE Day AS x, Temp AS ymin, Temp AS ymax FROM ggsql:airquality
DRAW ribbon
SETTING aggregate => ('min', 'max')
```
16 changes: 15 additions & 1 deletion doc/syntax/layer/type/rule.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,12 @@ The following aesthetics are recognised by the rule layer.

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY` and all discrete mappings. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

For diagonal lines, the position aesthetic determines the intercept:

Expand Down Expand Up @@ -110,4 +114,14 @@ VISUALISE FROM ggsql:penguins
intercept AS y,
label AS colour
FROM lines
```
```

Show a max rule for a timeseries

```{ggsql}
VISUALISE Temp AS y FROM ggsql:airquality
DRAW line
MAPPING Date AS x
DRAW rule
SETTING aggregate => 'max'
```
5 changes: 4 additions & 1 deletion doc/syntax/layer/type/segment.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,12 @@ For axis-aligned intervals where one coordinate is shared between the start and

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate` Aggregation functions to apply per group:
* `null` apply no group aggregation (default).
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The segment layer does not transform its data but passes it through unchanged.
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one segment per group. Segment is a range layer with two defaults: the first applies to the start point (`x`/`y`) and the second applies to the end point (`xend`/`yend`). Use a single default like `'mean'` to apply the same function to all four endpoints, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
The segment layer has no orientations. The axes are treated symmetrically.
Expand Down
Loading
Loading