Skip to content

Comments

Improve push down limit (logical optimizer rule)#15744

Closed
xudong963 wants to merge 4 commits intoapache:mainfrom
xudong963:improve_push_down_limit
Closed

Improve push down limit (logical optimizer rule)#15744
xudong963 wants to merge 4 commits intoapache:mainfrom
xudong963:improve_push_down_limit

Conversation

@xudong963
Copy link
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

If skip is zero, we can directly remove the limit, the current behavior is to remove the limit at the second round optimization.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Apr 17, 2025
@xudong963
Copy link
Member Author

xudong963 commented Apr 17, 2025

    user_defined::user_defined_plan::topk_invariants
    user_defined::user_defined_plan::topk_invariants_after_invalid_mutation
    user_defined::user_defined_plan::topk_plan

The failing tests are related to topk (in the user_defined_plan.rs).

Because the PR removes the limit during the first round, so TopKOptimizerRule doesn't have a chance to replace limit + sort with Topk.

I have a question, what's the difference between the Sort(Topk) and Topk?

@xudong963 xudong963 force-pushed the improve_push_down_limit branch from d8bdbec to a8b64b8 Compare April 17, 2025 06:24
@xudong963 xudong963 changed the title Improve push down limit Improve push down limit (logical optimizer rule) Apr 17, 2025
@xudong963
Copy link
Member Author

Topk

IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test.

@2010YOUY01
Copy link
Contributor

Topk

IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test.

Yes, now DataFusion don't have a TopK execution plan, instead it's using an inner struct inside SortExecfor topk queries, and I think it's represented by Sort(topk) in explains.

}
} else {
sort.fetch = new_fetch;
if skip == 0 && original_sort_fetch.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a comment to explain why && original_sort_fetch.is_none()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment has triggered my deeper thinking, now I think we don't need the condition check

@xudong963 xudong963 force-pushed the improve_push_down_limit branch from 80722ff to d831173 Compare April 30, 2025 14:07
use async_trait::async_trait;
use futures::{Stream, StreamExt};

/// Execute the specified sql and return the resulting record batches
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we'll move the code about "how to write the user defined plan" to doc, so I moved the useless tests in the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the issue: #15774

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove the tests in some other PR so it is clear what behavior the code is changing, if any? I found it hard to find the actual code / behavior change in this PR with several different behaviors in there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@xudong963 xudong963 force-pushed the improve_push_down_limit branch from d831173 to 30c78ff Compare April 30, 2025 14:30
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xudong963

// ------ The implementation of the TopK code follows -----

#[derive(Debug)]
#[derive(Debug, Default)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are removing all the tests that refer to this structure, I think we should remove the rest of the code too rather than making it as "allow unused"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, let's wait for the PR : #15832,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, let's wait for the PR : #15832,

let we finish it up as soon as possible , I think I was missing somethings and cant able to understand to them properly , it would be great help if you collaborate upon it @xudong963 . you can add your suggestions upon it adding to the PR

use async_trait::async_trait;
use futures::{Stream, StreamExt};

/// Execute the specified sql and return the resulting record batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove the tests in some other PR so it is clear what behavior the code is changing, if any? I found it hard to find the actual code / behavior change in this PR with several different behaviors in there

@github-actions
Copy link

github-actions bot commented Jul 8, 2025

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Jul 8, 2025
@github-actions github-actions bot closed this Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants