Expose ExecutionContext.register_csv to the python bindings#524
Expose ExecutionContext.register_csv to the python bindings#524alamb merged 8 commits intoapache:masterfrom
Conversation
python/src/context.rs
Outdated
| name: &str, | ||
| path: &str, | ||
| has_header: bool, | ||
| delimiter: &[u8], |
There was a problem hiding this comment.
Here we should have a Schema argument exposed as well, but I noticed that FFI hasn't been implemented for Schema and DataType objects in arrow-rs. We should probably expose all of the ArrowSchema based structs there first, then convert pyarrow objects using the C interface rather than calling out to python functions (like the datatype python bindings are currently implemented).
There was a problem hiding this comment.
I agree. I only learnt that the Schema and DataType have a c data interface recently. This likely requires some refactoring on the arrow-rs, as it assumes that metadata do not require a specific in-memory alignment, and yet the c data interface makes such requirement.
There was a problem hiding this comment.
We should be able to accept pyarrow.Schema objects once apache/arrow-rs#439 gets merged.
Codecov Report
@@ Coverage Diff @@
## master #524 +/- ##
==========================================
- Coverage 76.03% 76.00% -0.04%
==========================================
Files 157 157
Lines 26990 27001 +11
==========================================
Hits 20521 20521
- Misses 6469 6480 +11
Continue to review full report at Codecov.
|
alamb
left a comment
There was a problem hiding this comment.
Is this one ready to go @jorgecarleitao ?
|
ping @jorgecarleitao |
|
@jorgecarleitao / @kszucs -- what is the plan for this PR? |
|
Since apache/arrow-rs#439 has been merged I can expose the |
|
Cool -- I am just trying to shepherd PRs that look like they got stale |
|
The test appear to be failing due to #818 |
| }) | ||
| } | ||
|
|
||
| pub fn to_rust_schema(ob: &PyAny) -> PyResult<Schema> { |
There was a problem hiding this comment.
Copied from https://github.com/apache/arrow-rs/blob/master/arrow-pyarrow-integration-testing/src/lib.rs#L136
Eventually we could add an optional module to arrow-rs where we implement the PyO3 conversion traits for arrow-rs <-> pyarrow interoperability for easier downstream integration.
|
@alamb this should be good to go, though we should revisit the FFI bindings in arrow-rs and a potential |
| for table in ["csv", "csv1", "csv2"]: | ||
| result = ctx.sql(f"SELECT COUNT(int) FROM {table}").collect() | ||
| result = pa.Table.from_batches(result) | ||
| assert result.to_pydict() == {"COUNT(int)": [4]} |
Which issue does this PR close?
Depends on #493
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?