Skip to content

Commit 482f32c

Browse files
authored
Enable parquet page level skipping (page index pruning) by default (#5099)
* Enable parquet page level skipping (page index pruning) by default * update
1 parent 2c1161a commit 482f32c

3 files changed

Lines changed: 3 additions & 3 deletions

File tree

datafusion/common/src/config.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ config_namespace! {
243243
pub struct ParquetOptions {
244244
/// If true, uses parquet data page level metadata (Page Index) statistics
245245
/// to reduce the number of rows decoded.
246-
pub enable_page_index: bool, default = false
246+
pub enable_page_index: bool, default = true
247247

248248
/// If true, the parquet reader attempts to skip entire row groups based
249249
/// on the predicate in the query and the metadata (min/max values) stored in

datafusion/core/tests/sqllogictests/test_files/information_schema.slt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ datafusion.execution.aggregate.scalar_update_factor 10
146146
datafusion.execution.batch_size 8192
147147
datafusion.execution.coalesce_batches true
148148
datafusion.execution.collect_statistics false
149-
datafusion.execution.parquet.enable_page_index false
149+
datafusion.execution.parquet.enable_page_index true
150150
datafusion.execution.parquet.metadata_size_hint NULL
151151
datafusion.execution.parquet.pruning true
152152
datafusion.execution.parquet.pushdown_filters false

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
4949
| datafusion.execution.collect_statistics | false | Should DataFusion collect statistics after listing files |
5050
| datafusion.execution.target_partitions | 0 | Number of partitions for query execution. Increasing partitions can increase concurrency. Defaults to the number of CPU cores on the system |
5151
| datafusion.execution.time_zone | +00:00 | The default time zone Some functions, e.g. `EXTRACT(HOUR from SOME_TIME)`, shift the underlying datetime according to this time zone, and then extract the hour |
52-
| datafusion.execution.parquet.enable_page_index | false | If true, uses parquet data page level metadata (Page Index) statistics to reduce the number of rows decoded. |
52+
| datafusion.execution.parquet.enable_page_index | true | If true, uses parquet data page level metadata (Page Index) statistics to reduce the number of rows decoded. |
5353
| datafusion.execution.parquet.pruning | true | If true, the parquet reader attempts to skip entire row groups based on the predicate in the query and the metadata (min/max values) stored in the parquet file |
5454
| datafusion.execution.parquet.skip_metadata | true | If true, the parquet reader skip the optional embedded metadata that may be in the file Schema. This setting can help avoid schema conflicts when querying multiple parquet files with schemas containing compatible types but different metadata |
5555
| datafusion.execution.parquet.metadata_size_hint | NULL | If specified, the parquet reader will try and fetch the last `size_hint` bytes of the parquet file optimistically. If not specified, two reads are required: One read to fetch the 8-byte parquet footer and another to fetch the metadata length encoded in the footer |

0 commit comments

Comments
 (0)