Skip to content

fix(prqlc): deduplicate selected items in gen_projection#5305

Merged
max-sixty merged 3 commits intoPRQL:mainfrom
lukapeschke:solve-5302
Jun 11, 2025
Merged

fix(prqlc): deduplicate selected items in gen_projection#5305
max-sixty merged 3 commits intoPRQL:mainfrom
lukapeschke:solve-5302

Conversation

@lukapeschke
Copy link
Copy Markdown
Contributor

fixes #5302

@lukapeschke
Copy link
Copy Markdown
Contributor Author

After spending a while digging into the intermediary steps, this is my naive attempt at solving #5302 .

However I'm not satisfied with it: Deduplicating what we have in SELECT statements seems kinda magic, and might not be what we want in some cases. Also, I believe this should be solved earlier in the compilation process.

I see a few things in the SQL output that could be solved:

  1. table_4 could be completely removed if FIRST_VALUE(my_concatenated_col) was directly selected as "new_name"
  2. CONCAT(col, ' ', other_col) AS my_concatenated_col and _expr_0 are not needed in table_1. I guess a solution I see here would be to process S-string tables with sqlparser to see if there is an explicit column selection. This would allow to not pass Wildcard around, and avoid the SELECT table_0.* in the final query.
  3. table_1 could just reference my_concatenated_col and _expr_0 by alias. Not sure how to pass the information "this expression has been evaluated in a previous CTE and does not need to be re-evaluated" around though.

@max-sixty do you have any ideas/hints on how I could move forward with this ? Thanks in advance!

@max-sixty
Copy link
Copy Markdown
Member

hi @lukapeschke, and thanks for the contribution!

is there an example of the proposed code behaving worse than the existing code? otherwise this seems like an upgrade...

(I agree it's not perfect, but making incremental improvements is still good...)

@lukapeschke
Copy link
Copy Markdown
Contributor Author

@max-sixty I've no example of this behaving worse than existing code.

You're right, we can proceed with incremental improvements. I simplified deduplicate_select_items to ensure the generated SQL query is valid, and added an extra alias to the test case to ensure it is taken into account in the final query.

I'll try to address the other issues in different PRs if that's okay with you ?

My next step will be to try to change FIRST_VALUE(name) AS _expr_0 to just _expr_0 in table_0, so the SQL query is valid

@lukapeschke lukapeschke marked this pull request as ready for review May 27, 2025 08:52
@lukapeschke
Copy link
Copy Markdown
Contributor Author

lukapeschke commented May 27, 2025

#5310 adresses the second point in this comment. I guess this needs to be done earlier in the process to prevent _expr_0 from being selected, but I wasn't quite sure where exactly table declarations get created from s-strings

fixes PRQL#5302

Signed-off-by: Luka Peschke <mail@lukapeschke.com>
…ple aliases work as expected in the test

Signed-off-by: Luka Peschke <mail@lukapeschke.com>
Signed-off-by: Luka Peschke <mail@lukapeschke.com>
@max-sixty
Copy link
Copy Markdown
Member

sorry I missed this @lukapeschke ! I missed a couple of others too, but am back & merging things!

I'll merge this. I'm a tiiiiny bit hesitant that we're just removing any duplicate columns — there may be times that people expect duplicate columns to persist, and ideally we'd solve this at a different stage in the compiler

but given where we're at, the result does seem better, and endlessly pushing off small improvements in the hope of fixing underlying compiler issues doesn't seem like a successful strategy :)

thank you for the contribution!

@max-sixty max-sixty merged commit 3f2e21b into PRQL:main Jun 11, 2025
36 checks passed
@lukapeschke lukapeschke deleted the solve-5302 branch June 12, 2025 08:15
@lukapeschke
Copy link
Copy Markdown
Contributor Author

Thanks a lot for the merge @max-sixty :)

I think I've finally found the underlying issue, details here: #5302 (comment)

I'm not really sure about how to proceed to fix it though, I'm not sure what the condition should be to determine if the sort columns should be reverted or not... Maybe you have an idea ?

Also, could you please re-open #5302 since it's not completely fixed yet ? I don't have the rights to do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

select in joined pipeline causes invalid expressions to be referenced

2 participants