Conversation
Upstream is continuing it's migration to UDFs. Ref apache/datafusion#10098 Ref apache/datafusion#10372
…ters_pushdown Deprecated function removed in apache/datafusion#9923
These relied on upstream BuiltinScalarFunction, which are now removed. Ref apache/datafusion#10098
`null_count` was fixed upstream. Ref apache/datafusion#10260
DFField was removed upstream. Ref: apache/datafusion#9595
f311d66 to
abe09a2
Compare
| } | ||
| } | ||
|
|
||
| impl PyExpr { |
There was a problem hiding this comment.
@jdye64 you may want to review this PR since it removes code that I believe you originally added
There was a problem hiding this comment.
@jdye64 - I had removed the method because it relied on DFField which was removed in datafusion.
The last commit attempts to re-implement the method using arrow's Field.
I'd still much appreciate any feedback / context!
There was a problem hiding this comment.
It looks like Dask SQL is using a pinned version of this repo from more than six months ago, so we likely won't get a review from the team right away. The new functionality based on Field looks good to me, so I will go ahead and merge this PR.
There was a problem hiding this comment.
Yeah this is fine. Honestly we need to come up with a better way to get the column name anyway and as you mentioned are using a pinned older version for now anyway.
| "a": [3.0, 0.0, 2.0, 1.0, 1.0, 3.0, 2.0], | ||
| "b": [3.0, 0.0, 5.0, 1.0, 4.0, 6.0, 5.0], | ||
| "c": [3.0, 0.0, 7.0, 1.7320508075688772, 5.0, 8.0, 8.0], |
There was a problem hiding this comment.
Why are these changes needed?
There was a problem hiding this comment.
null_count was fixed upstream in apache/datafusion#10260
The underlying data being described:
>>> print(df)
DataFrame()
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 4 | 8 |
| 2 | 5 | 5 |
| 3 | 6 | 8 |
+---+---+---+
The previous implementation relied on `DFField` which was removed upstream. Ref: apache/datafusion#9595
andygrove
left a comment
There was a problem hiding this comment.
LGTM. Thank you @Michael-J-Ward. It is great to see this project keeping up with DataFusion core.
Which issue does this PR close?
Closes #690.
Are there any user-facing changes?
DFFieldand related methods were removedPyScalarFunctionandPyBuiltinScalarFunctionwere removednull_countwas fixed upstream so the behavior has changed