Add CsvExecBuilder for creating CsvExec#11633
Conversation
This adds the `CsvExecBuilder` struct for building a `CsvExec` instance, and deprecates the `CsvExec::new` method which has grown too large. There are some `TODO`s related to the duplication of formatting options and their defaults coming from multiple places. Uses of the deprecated `new` method have not been updated yet.
| use tokio::task::JoinSet; | ||
|
|
||
| /// Execution plan for scanning a CSV file | ||
| /// Execution plan for scanning a CSV file. |
|
|
||
| impl CsvExec { | ||
| /// Create a new CSV reader execution plan provided base and specific configurations | ||
| #[deprecated(since = "41.0.0", note = "use `CsvExec::builder` or `CsvExecBuilder`")] |
There was a problem hiding this comment.
💯 for keeping (and deprecating the old API) to make migration / upgrade easier.
| false, | ||
| file_compression_type.to_owned(), | ||
| ); | ||
| let csv = CsvExec::builder(config) |
| pub fn new(file_scan_config: FileScanConfig) -> Self { | ||
| Self { | ||
| file_scan_config, | ||
| // TODO: these defaults are duplicated from `CsvOptions` - should they be computed? |
There was a problem hiding this comment.
The other thing we could do is to add a test that ensures the options are the same rather than having to compute them each time
There was a problem hiding this comment.
As in, a test that makes sure that the default settings in a CsvExec matches the default CsvOptions? I'm happy to add that. I likely won't get to it in the next ~24h so happy to either hold this PR or follow-up after.
There was a problem hiding this comment.
As in, a test that makes sure that the default settings in a CsvExec matches the default CsvOptions? I'm happy to add that. I likely won't get to it in the next ~24h so happy to either hold this PR or follow-up after.
Yes, that is what I was thinking -- that way if the default CsvOptions are ever changed the test will fail until we also update the deafult values in CsvExecBuilder
There was a problem hiding this comment.
As in, a test that makes sure that the default settings in a CsvExec matches the default CsvOptions? I'm happy to add that. I likely won't get to it in the next ~24h so happy to either hold this PR or follow-up after.
Thanks @connec -- no worries! I actually had this PR open locally so I took the liberty of writing the test in 99d158d and pushing that to the branch
|
Thank you @alamb! There's one other datafusion/datafusion/core/src/datasource/physical_plan/csv.rs Lines 97 to 103 in debfd19 I imagine this may be at least somewhat desirable/intentional as part of ensuring crate boundaries? An alternative could be to, e.g., use |
| pub struct CsvExecBuilder { | ||
| file_scan_config: FileScanConfig, | ||
| file_compression_type: FileCompressionType, | ||
| // TODO: it seems like these format options could be reused across all the various CSV config |
There was a problem hiding this comment.
We could possibly use https://docs.rs/datafusion/latest/datafusion/config/struct.CsvOptions.html directly
That seems to be what the parquet writer does https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/struct.ParquetExec.html#method.table_parquet_options (aka the ParquetExec directly uses the options struct)
Perhaps as a follow on PR
|
Thanks again @connec |

Which issue does this PR close?
Closes #11565.
Rationale for this change
As part of #11533, the number of arguments required for
CsvExec::newgrew long enough (8) to trigger a clippy lint. In the interests of getting the PR merged we just suppressed the lint, with the intention of replacing it with a builder in a follow-up PR (👋).What changes are included in this PR?
This PR introduces a
CsvExecBuilder, along with plumbing similar to theParquetExecBuilder.CsvExec::newhas been marked as deprecated, and all usages of it in the repo have been replaced withCsvExec::builder.Are these changes tested?
A document test has been added to
CsvExecwhich exercises (some of) the builder public API. This is analogous to the doc test forParquetExec.Additionally,
CsvExec::newwas used itself in several tests and these have all been migrated to the builder.Are there any user-facing changes?
datafusion::datasource::physical_plan::CsvExecBuilder.datafusion::datasource::physical_plan::CsvExec::builder.datafusion::datasource::physical_plan::CsvExec::new.