Encapsulate EquivalenceClass into a struct#8034
Conversation
d611a07 to
59c1f5c
Compare
| /// equality predicates (e.g. `a = b`), typically equi-join conditions and | ||
| /// equality conditions in filters. | ||
| #[derive(Debug, Clone)] | ||
| pub struct EquivalenceClass { |
There was a problem hiding this comment.
@ozankabak and @mustafasrepo I am curious what you think about this change to encapsulate free functions into a struct in the equivalence code? If you like it, I can pull most of the rest of the code in datafusion/physical-expr/src/physical_expr.rs to this structure instead of free functions
There was a problem hiding this comment.
I like this style, EquivalenceClass should be its own type. When comparing two equivalence classes, calling eq on this type is much more readable than calling a utility that operates on PhysicalExpr slices directly.
You seem to be keeping the lower-level functional primitives around, which is also good -- we can reuse them to create encapsulations like EquivalenceClass as we build more.
There was a problem hiding this comment.
You seem to be keeping the lower-level functional primitives around, which is also good -- we can reuse them to create encapsulations like EquivalenceClass as we build more.
I did this mostly to keep the size of the initial diff down to make the proposal easier to review.
It seems to me like EquivalenceClass has very few functions actually related to equivalence calculations -- it is mostly a container of PhysicalExprs -- maybe it would be better named something like PhysicalExprList 🤔 But then the equivalence calculations would be less readable perhaps.
There was a problem hiding this comment.
Exactly. I see two options:
- Following your approach where we encapsulate free functions in structs and methods whenever we can define a coherent struct for them, and we also keep these free functions so that similar structs can share them.
- Create agnostic structs (like
PhysicalExprList) and convert the free functions to their methods, and use type aliases likeEquivalenceClass=PhysicalExprListto increase readability depending on the context.
There was a problem hiding this comment.
I would like to propose with starting with EquivalenceClass into its own struct and we can keep the utility functions around for a while to see if they are reusable
EquivalenceClassEquivalenceClass into a struct
|
I believe this PR is ready for reivew @ozankabak and @mustafasrepo -- however, I believe it may cause non trivial merge conflicts with #8107. However, I think the style in this PR would make stuff like #8107 easier to implement 🤔 |
| fn eq(&self, other: &Self) -> bool { | ||
| physical_exprs_equal(&self.inner, &other.inner) | ||
| } |
There was a problem hiding this comment.
Mathematically, an equivalence class is a set of expressions. Therefore, eq implementation should use set equality. AFAIK we only check for list-equality during testing (and if we don't, that could be sign of a bug or a bad design!). Therefore, I suggest changing this eq_strict and implementing eq with set equality (and getting rid of eq_bag).
| /// equality conditions in filters. | ||
| #[derive(Debug, Clone)] | ||
| pub struct EquivalenceClass { | ||
| inner: Vec<Arc<dyn PhysicalExpr>>, |
There was a problem hiding this comment.
Maybe exprs or items is a better name here as this is the vector containing elements (expressions) that belong to this equivalence class.
| } | ||
|
|
||
| /// Removes all duplicated exprs in this class | ||
| // TODO should we deduplicate on insert? |
There was a problem hiding this comment.
We can do a quick contains check on push, and avoid adding the expression if the former check returns true. And you can call deduplicate_physical_exprs in new, and there would be no need for a deduplicate function anymore.
There was a problem hiding this comment.
This fits well with the definition of this being a set (rather than an ordered check). I will try it
| } | ||
|
|
||
| // Create a new equivalence class from a pre-existing `Vec` | ||
| pub fn new_from_vec(inner: Vec<Arc<dyn PhysicalExpr>>) -> Self { |
There was a problem hiding this comment.
If we have a new_empty, why don't we simply call this new(exprs: Vec<Arc<dyn PhysicalExpr>>)?
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
…b/arrow-datafusion into alamb/encapsulate_equivalence_class
|
I implemented @ozankabak 's suggestion to treat |
|
Any concerns with merging this @mustafasrepo ? |
Not at all. Thanks @alamb as always for this PR. |
Which issue does this PR close?
related to #8006
Rationale for this change
As I proposed in #8006 (comment) I think making stucts rather than typedefs leads to easier to maintain code as the functionality is easier to find and it allows / encourages more documentation and encapsulation
This also was a good excuse for me to try and work with the code some
What changes are included in this PR?
EquivalenceClassa real structure rather than a typedefI actually think maybe the right thing to do is to call this thing
PhysicalExprSetor something as it is more general than just equivalence classesI plan to move all the functions into a new module rather than calling functions in
utilsas a follow on PRAre these changes tested?
Existing tests
Are there any user-facing changes?
yes, since the APIs in question are
pubthis is technically a breaking API change. However, given they are all quite new (within this week) and not yet released, I don't think there will be as much direct impact.