Expose ExecutionContext.register_csv to the python bindings by kszucs · Pull Request #524 · apache/datafusion

kszucs · 2021-06-08T14:46:46Z

Which issue does this PR close?

Depends on #493

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

kszucs · 2021-06-08T14:49:53Z

python/src/context.rs

+        name: &str,
+        path: &str,
+        has_header: bool,
+        delimiter: &[u8],


Here we should have a Schema argument exposed as well, but I noticed that FFI hasn't been implemented for Schema and DataType objects in arrow-rs. We should probably expose all of the ArrowSchema based structs there first, then convert pyarrow objects using the C interface rather than calling out to python functions (like the datatype python bindings are currently implemented).

I agree. I only learnt that the Schema and DataType have a c data interface recently. This likely requires some refactoring on the arrow-rs, as it assumes that metadata do not require a specific in-memory alignment, and yet the c data interface makes such requirement.

We should be able to accept pyarrow.Schema objects once apache/arrow-rs#439 gets merged.

codecov-commenter · 2021-06-09T19:50:51Z

Codecov Report

Merging #524 (7438b88) into master (d5bca0e) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

❗ Current head 7438b88 differs from pull request most recent head 0412a5b. Consider uploading reports for the commit 0412a5b to get more accurate results

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
- Coverage   76.03%   76.00%   -0.04%     
==========================================
  Files         157      157              
  Lines       26990    27001      +11     
==========================================
  Hits        20521    20521              
- Misses       6469     6480      +11

Impacted Files	Coverage Δ
python/src/context.rs	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5bca0e...0412a5b. Read the comment docs.

alamb

Is this one ready to go @jorgecarleitao ?

alamb · 2021-06-23T13:19:28Z

ping @jorgecarleitao

alamb · 2021-07-13T19:51:53Z

@jorgecarleitao / @kszucs -- what is the plan for this PR?

kszucs · 2021-07-13T19:55:26Z

Since apache/arrow-rs#439 has been merged I can expose the schema argument as well, though we can defer that to a follow-up PR too.

alamb · 2021-07-13T20:58:55Z

Cool -- I am just trying to shepherd PRs that look like they got stale

alamb · 2021-08-02T18:55:58Z

The test appear to be failing due to #818

kszucs · 2021-08-05T13:16:22Z

python/src/to_rust.rs

    })
 }
+
+pub fn to_rust_schema(ob: &PyAny) -> PyResult<Schema> {


Copied from https://github.com/apache/arrow-rs/blob/master/arrow-pyarrow-integration-testing/src/lib.rs#L136

Eventually we could add an optional module to arrow-rs where we implement the PyO3 conversion traits for arrow-rs <-> pyarrow interoperability for easier downstream integration.

kszucs · 2021-08-05T20:04:21Z

@alamb this should be good to go, though we should revisit the FFI bindings in arrow-rs and a potential arrow-rs <-> pyarrow bridge implemented in arrow-rs in the future.

alamb

Thanks @kszucs - I reviewed the test carefully and skimmed the code. Looks great to me. Thank you so much

alamb · 2021-08-06T18:52:35Z

python/tests/test_sql.py

+    for table in ["csv", "csv1", "csv2"]:
+        result = ctx.sql(f"SELECT COUNT(int) FROM {table}").collect()
+        result = pa.Table.from_batches(result)
+        assert result.to_pydict() == {"COUNT(int)": [4]}


kszucs commented Jun 8, 2021

View reviewed changes

kszucs changed the title ~~[Python] Expose ExecutionContext.register_csv~~ Expose ExecutionContext.register_csv to the python bindings Jun 9, 2021

kszucs force-pushed the csv branch from 17be85f to 0412a5b Compare June 9, 2021 18:49

kszucs requested a review from jorgecarleitao June 10, 2021 16:16

alamb added the python label Jun 11, 2021

alamb reviewed Jun 11, 2021

View reviewed changes

alamb added the stale-pr label Jul 13, 2021

kszucs force-pushed the csv branch from 0412a5b to 198002b Compare August 2, 2021 13:42

kszucs added 4 commits August 5, 2021 11:49

Expose register_csv

9498c8c

Validate delimiter

c10f6f9

Fix tests

9b06c18

Pass schema

9d8c63e

kszucs force-pushed the csv branch from dd3f50f to 9d8c63e Compare August 5, 2021 11:40

kszucs removed the stale-pr label Aug 5, 2021

unused imports

539ca25

kszucs commented Aug 5, 2021

View reviewed changes

kszucs added 3 commits August 5, 2021 15:24

add linting

bca0f54

Update deps

73eee6f

Restore venv

ca3b663

houqp approved these changes Aug 6, 2021

View reviewed changes

alamb approved these changes Aug 6, 2021

View reviewed changes

alamb merged commit 5a7bbcc into apache:master Aug 6, 2021

houqp added the enhancement New feature or request label Aug 7, 2021

H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025

small clippy fix (apache#524)

d7fcea2

Conversation

kszucs commented Jun 8, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

kszucs Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

jorgecarleitao Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

kszucs Jun 10, 2021

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 9, 2021

Codecov Report

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 23, 2021

Uh oh!

alamb commented Jul 13, 2021

Uh oh!

kszucs commented Jul 13, 2021

Uh oh!

alamb commented Jul 13, 2021

Uh oh!

alamb commented Aug 2, 2021

Uh oh!

kszucs Aug 5, 2021

Choose a reason for hiding this comment

Uh oh!

kszucs commented Aug 5, 2021

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants