Skip to content

Comments

Update extending-operators.md#15832

Closed
Adez017 wants to merge 33 commits intoapache:mainfrom
Adez017:patch-1
Closed

Update extending-operators.md#15832
Adez017 wants to merge 33 commits intoapache:mainfrom
Adez017:patch-1

Conversation

@Adez017
Copy link
Contributor

@Adez017 Adez017 commented Apr 23, 2025

Which issue does this PR close?

Rationale for this change

updated the extending-operators.md file

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 23, 2025
@Adez017 Adez017 marked this pull request as draft April 23, 2025 17:04
@Adez017
Copy link
Contributor Author

Adez017 commented Apr 23, 2025

hi @xudong963 , i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .
Also please review the changes that I made and provide feedback based on it

@xudong963
Copy link
Member

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 24, 2025

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

@xudong963
Copy link
Member

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

Yes, except tests.

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 24, 2025

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

Yes, except tests.

that means above it :

@Adez017 Adez017 marked this pull request as ready for review April 25, 2025 04:55
@Adez017
Copy link
Contributor Author

Adez017 commented Apr 25, 2025

Hi @xudong963 , i think it is ready , give it a check

@xudong963
Copy link
Member

xudong963 commented Apr 25, 2025

You can refer to the doc: https://datafusion.apache.org/library-user-guide/custom-table-providers.html.

It should contain the real code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L458-L916 to describe the process of defining an extending operator, not only an SQL example.

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 25, 2025

hey @xudong963 , check it out now .
also could you help in the failing check in workflow?

@xudong963
Copy link
Member

You can rebase with main

@xudong963
Copy link
Member

Would anyone happen to know how to preview the HTML format for the PR changes?

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 27, 2025

You can rebase with main

doe this solve the issue ?

@xudong963
Copy link
Member

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

@Adez017
Copy link
Contributor Author

Adez017 commented May 1, 2025

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

Thank for your help @xudong963 but I think it didn't work for the failing workflow

@xudong963
Copy link
Member

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

Thank for your help @xudong963 but I think it didn't work for the failing workflow

Sorry I am on vacation , you can try to fix by the error hints in ci

@Adez017
Copy link
Contributor Author

Adez017 commented May 2, 2025

i Think we need @alamb help now . could you help ?

@Adez017
Copy link
Contributor Author

Adez017 commented May 3, 2025

@alamb , please take a look

@Adez017
Copy link
Contributor Author

Adez017 commented May 3, 2025

could anyone please help here ?

@xudong963
Copy link
Member

Could you please refer to the error lints in CI? Such as

error[E0433]: failed to resolve: use of undeclared type `Statistics`
  --> datafusion/core/src/lib.rs:1365:12
   |
98 |         Ok(Statistics::new_unknown(&self.schema()))
   |            ^^^^^^^^^^ use of undeclared type `Statistics`
   |
help: consider importing one of these items
   |
2  + use datafusion::physical_plan::Statistics;
   |
2  + use datafusion_common::Statistics;
   |
2  + use datafusion_physical_plan::Statistics;
   |
2  + use parquet::file::statistics::Statistics;

It's saying, you need to import Statistics for the example in the doc, such as
image

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Adez017 -- I took a brief look at this and I pushed some small changes.

I think the code will need some more work

@Adez017 Adez017 requested a review from alamb May 14, 2025 17:30
@Adez017
Copy link
Contributor Author

Adez017 commented May 15, 2025

@alamb i think it perfect now and working . please take a look

@Adez017 Adez017 requested a review from xudong963 May 15, 2025 03:37
@Adez017
Copy link
Contributor Author

Adez017 commented May 16, 2025

@alamb @xudong963 gentleman's tis need your attention now

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

It's very close to the finish line. The last thing that we need is to preview the HTML. cc @alamb , could you please show me how to do the preview?


Note: DataFusion contains a highly optimized version of the `TopK` operator, but we present a simplified version in this section for explanatory purposes. For more information, see the full implementation in the [DataFusion repository].

[DataFusion repository]: https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[DataFusion Doc]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it should be a link to the source code - I was trying to avoid some link that would get out of date but maybe that in inevitable

@alamb
Copy link
Contributor

alamb commented May 16, 2025

LGTM, thank you.

It's very close to the finish line. The last thing that we need is to preview the HTML. cc @alamb , could you please show me how to do the preview?

I believe you can follow the steps here: https://github.com/apache/datafusion/tree/main/docs#build--preview

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Adez017 and @xudong963 -- I agree this is very close.

This current PR also adds a copy of adding a user defined plan. Maybe we can also remove the old copy?


Note: DataFusion contains a highly optimized version of the `TopK` operator, but we present a simplified version in this section for explanatory purposes. For more information, see the full implementation in the [DataFusion repository].

[DataFusion repository]: https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it should be a link to the source code - I was trying to avoid some link that would get out of date but maybe that in inevitable

}
```

DataFusion supports extension of operators by transforming logical plan and execution plan through customized [optimizer rules](https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html). This section will use the µWheel project to illustrate such capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section now seems somewhat to duplicate the new sections

Copy link
Contributor Author

@Adez017 Adez017 May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can rationalize it

Copy link
Contributor Author

@Adez017 Adez017 May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this ? we can write a separate guide or something for the µWheel , removing the content from this PR and make it as a separate . if you are agree @alamb , I would love to do the job if possible

@Adez017 Adez017 requested review from alamb and xudong963 May 16, 2025 13:51
@Adez017
Copy link
Contributor Author

Adez017 commented May 17, 2025

@alamb cc: @xudong963 check it now

@Adez017
Copy link
Contributor Author

Adez017 commented May 19, 2025

hey @alamb @xudong963 since your last visit I had made some changes . please take a look

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Jul 19, 2025
@Adez017
Copy link
Contributor Author

Adez017 commented Jul 19, 2025

hi @alamb please have a look on this

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Adez017

Thank you for your patience.

I think there are a few outstanding issues we should to resolve prior to merging this:

  1. Remove the (now) redundant example, as specified in the original ticket #15774
  2. Try and add a bit more detail to the examples.

Removing the existing example may be harder as it seems the test has morphed to also be testing user defined invariants.

I will give it a shot

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 22, 2025

Thanks @Adez017

Thank you for your patience.

I think there are a few outstanding issues we should to resolve prior to merging this:

  1. Remove the (now) redundant example, as specified in the original ticket Move code in user_defined_plan.rs to the extending-operators doc #15774
  2. Try and add a bit more detail to the examples.

Removing the existing example may be harder as it seems the test has morphed to also be testing user defined invariants.

I will give it a shot

Hi @alamb, thank you for your time and efforts. i think there might be some updates that had happened since my last visit . i would love it if someone help me out with that

@github-actions github-actions bot removed the Stale PR has not had any activity for some time label Jul 23, 2025
@Adez017 Adez017 closed this by deleting the head repository Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move code in user_defined_plan.rs to the extending-operators doc

3 participants