Skip to content

fix: ensure sorts are always done on columns of the right table#5338

Merged
max-sixty merged 6 commits intoPRQL:mainfrom
lukapeschke:fix-cid-redirects-for-projection-select
Jun 25, 2025
Merged

fix: ensure sorts are always done on columns of the right table#5338
max-sixty merged 6 commits intoPRQL:mainfrom
lukapeschke:fix-cid-redirects-for-projection-select

Conversation

@lukapeschke
Copy link
Copy Markdown
Contributor

closes #5302

This fixes #5302 with two changes:

  1. First, it recomputes cid_redirects after compiling a relation (i.e. a pipeline), because we rely on them in the postprocess step for sorting. This is needed because CTEs get modified during compile_pipeline: some are merged or removed, and some are added
  2. Then, rather than reverting sort columns when iterating over every CTE, we do it once when processing the main pipeline: the sort columns are reverted back to their very first form via cid_mappings, and are then reverted "back" to the front, but we prefer using their alias when they are renamed. This allows to pass named columns around rather expressions, and ensure that the sort columns are available in the main pipeline, even with intermediary select steps. An example where this is needed is the new group_sort_filter_derive_select_join integration test: if you remove the aliasing part, we will attempt sorting on a column that has been de-selected.

The two changes feel a bit clunky (especially alias_last_sorting), but I'm not really sure how to fix this without a proper lineage for columns 😕 Something ideal would be an object for every column, with information such as "this column comes from table X /expression Y, becomes FOO in table T, then BAR in table T2". It could avoid having to keep ctes and relation_instances in sync.

@kgutwin @max-sixty thanks a lot for your help on this one, it has made it much easier to dive into the code!

Signed-off-by: Luka Peschke <mail@lukapeschke.com>
Signed-off-by: Luka Peschke <mail@lukapeschke.com>
Signed-off-by: Luka Peschke <mail@lukapeschke.com>
Signed-off-by: Luka Peschke <mail@lukapeschke.com>
Signed-off-by: Luka Peschke <mail@lukapeschke.com>
…r sqlite and mysql

Signed-off-by: Luka Peschke <mail@lukapeschke.com>
@max-sixty
Copy link
Copy Markdown
Member

this looks great!

(I agree it's a bit clunky, but unless I'm misunderstanding, the result is very good! and we can re-architect one day™)

I'll merge; @kgutwin if you have any feedback please add for a post-review!

@max-sixty max-sixty merged commit 5cbeec6 into PRQL:main Jun 25, 2025
34 checks passed
@kgutwin
Copy link
Copy Markdown
Collaborator

kgutwin commented Jun 25, 2025

Glad for the merge!

I agree that proper lineage tracking would be very useful here. In general, the SQL processing code is tough to follow and reason through (hence why @lukapeschke and I had so much trouble fixing these bugs) so any refactoring we could do for modularization or just better data structures would be a really good idea.

Thanks!

@vanillajonathan
Copy link
Copy Markdown
Collaborator

Is this something that should be mentioned in the changelog?

@max-sixty
Copy link
Copy Markdown
Member

Glad for the merge!

I agree that proper lineage tracking would be very useful here. In general, the SQL processing code is tough to follow and reason through (hence why @lukapeschke and I had so much trouble fixing these bugs) so any refactoring we could do for modularization or just better data structures would be a really good idea.

Thanks!

yes, strongly agree...

@lukapeschke
Copy link
Copy Markdown
Contributor Author

thanks for the quick merge 🎉

@lukapeschke lukapeschke deleted the fix-cid-redirects-for-projection-select branch October 2, 2025 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

select in joined pipeline causes invalid expressions to be referenced

4 participants