feat(substrait): remove dependency on datafusion default features#13594
feat(substrait): remove dependency on datafusion default features#13594alamb merged 2 commits intoapache:mainfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thanks @notfilippo -- this looks like a good change to me.
I wonder if there is some way we can add a CI test for this (to avoid breaking it in the future) -- perhaps something like
datafusion/.github/workflows/rust.yml
Line 81 in ddee471
|
I think we need to
index 50bebc5b4..0b2338cd3 100644
--- a/.github/workflows/rust.yml
+++ b/.github/workflows/rust.yml
@@ -80,9 +80,12 @@ jobs:
- name: Check datafusion-common without default features
run: cargo check --all-targets --no-default-features -p datafusion-common
- - name: Check datafusion-functions
+ - name: Check datafusion-functions without default features
run: cargo check --all-targets --no-default-features -p datafusion-functions
+ - name: Check datafusion-substrait without default features
+ run: cargo check --all-targets --no-default-features -p datafusion-substrait
+
- name: Check workspace in debug mode
run: cargo check --all-targets --workspace
I tested with andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo check --all-targets --no-default-features -p datafusion-substrait
warning: function `coerce_file_schema_to_view_type` is never used
--> datafusion/core/src/datasource/file_format/mod.rs:428:15
|
428 | pub(crate) fn coerce_file_schema_to_view_type(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: function `coerce_file_schema_to_string_type` is never used
--> datafusion/core/src/datasource/file_format/mod.rs:492:15
|
492 | pub(crate) fn coerce_file_schema_to_string_type(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: `datafusion` (lib) generated 2 warnings
Checking datafusion-substrait v43.0.0 (/Users/andrewlamb/Software/datafusion/datafusion/substrait)
error[E0432]: unresolved import `datafusion::datasource::physical_plan::ParquetExec`
--> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:25:61
|
25 | use datafusion::datasource::physical_plan::{FileScanConfig, ParquetExec};
| ^^^^^^^^^^^ no `ParquetExec` in `datasource::physical_plan`
|
note: found an item that was configured out
--> /Users/andrewlamb/Software/datafusion/datafusion/core/src/datasource/physical_plan/mod.rs:34:25
|
34 | pub use self::parquet::{ParquetExec, ParquetFileMetrics, ParquetFileReaderFactory};
| ^^^^^^^^^^^
note: the item is gated behind the `parquet` feature
--> /Users/andrewlamb/Software/datafusion/datafusion/core/src/datasource/physical_plan/mod.rs:33:7
|
33 | #[cfg(feature = "parquet")]
| ^^^^^^^^^^^^^^^^^^^
error[E0432]: unresolved import `datafusion_substrait::physical_plan`
--> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:29:27
|
29 | use datafusion_substrait::physical_plan::{consumer, producer};
| ^^^^^^^^^^^^^ could not find `physical_plan` in `datafusion_substrait`
|
note: found an item that was configured out
--> /Users/andrewlamb/Software/datafusion/datafusion/substrait/src/lib.rs:79:9
|
79 | pub mod physical_plan;
| ^^^^^^^^^^^^^
note: the item is gated behind the `physical` feature
--> /Users/andrewlamb/Software/datafusion/datafusion/substrait/src/lib.rs:78:7
|
78 | #[cfg(feature = "physical")]
| ^^^^^^^^^^^^^^^^^^^^
error[E0425]: cannot find function `parquet_test_data` in module `datafusion::test_util`
--> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:154:43
|
154 | let testdata = datafusion::test_util::parquet_test_data();
| ^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `arrow_test_data`
|
::: /Users/andrewlamb/Software/datafusion/datafusion/common/src/test_util.rs:201:1
|
201 | pub fn arrow_test_data() -> String {
| ---------------------------------- similarly named function `arrow_test_data` defined here
|
note: found an item that was configured out
--> /Users/andrewlamb/Software/datafusion/datafusion/core/src/test_util/mod.rs:60:39
|
60 | pub use datafusion_common::test_util::parquet_test_data;
| ^^^^^^^^^^^^^^^^^
note: the item is gated behind the `parquet` feature
--> /Users/andrewlamb/Software/datafusion/datafusion/core/src/test_util/mod.rs:59:7
|
59 | #[cfg(feature = "parquet")]
| ^^^^^^^^^^^^^^^^^^^
error[E0599]: no method named `register_parquet` found for struct `datafusion::prelude::SessionContext` in the current scope
--> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:145:9
|
145 | ctx.register_parquet("data", "tests/testdata/data.parquet", explicit_options)
| ^^^^^^^^^^^^^^^^
|
help: there is a method `register_arrow` with a similar name
|
145 | ctx.register_arrow("data", "tests/testdata/data.parquet", explicit_options)
| ~~~~~~~~~~~~~~
error[E0599]: no method named `register_parquet` found for struct `datafusion::prelude::SessionContext` in the current scope
--> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:155:9
|
155 | ctx.register_parquet(
| ----^^^^^^^^^^^^^^^^
|
help: there is a method `register_arrow` with a similar name
|
155 | ctx.register_arrow(
| ~~~~~~~~~~~~~~
Some errors have detailed explanations: E0425, E0432, E0599.
For more information about an error, try `rustc --explain E0425`.
error: could not compile `datafusion-substrait` (test "substrait_integration") due to 5 previous errors
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ |
|
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
|
Thanks @alamb – Weirdly enough I didn't receive any pings from GitHub about activity on this issue / PR... Anyway thanks for pointing out the issue with the roundtrip test. The fix I posted should be enough. |
alamb
left a comment
There was a problem hiding this comment.
Thanks @notfilippo -- makes sense to me
|
Thanks again @notfilippo |
Which issue does this PR close?
Closes #13593
What changes are included in this PR?
physicalfeature, which enables production and consumption of physical substrait plans.Are these changes tested?
Yes, running
cargo tree -p datafusion-substraitwith and without the--no-default-featuresshows thatparquetand many other dependencies are not included.Are there any user-facing changes?
No, the new feature is enabled by default. Users who might want to benefit from this change will have to use
default-features = falsewhen including thedatafusion-substraitcrate.