-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
This is a dump of a conversation with Claude Code
Problem
When a predicate like b IS NOT NULL OR a = 24 is rewritten for a file schema where column b doesn't exist, the b IS NOT NULL becomes IS NOT NULL(NULL literal). This should simplify to false, allowing the predicate to become a = 24 for proper row group pruning.
Currently:
PhysicalExprSimplifierdoesn't simplifyIS NULL(literal)orIS NOT NULL(literal)build_predicate_expressiontreatsIS NOT NULL(literal)as "unhandled" → returnstrue- Result:
true OR (pruning_expr)=true→ no pruning possible
Solution
Implement both fixes so that IS NOT NULL(lit(1)) and similar expressions benefit from simplification at multiple levels.
Change 1: Add IS NULL/IS NOT NULL Simplification to PhysicalExprSimplifier
File: datafusion/physical-expr/src/simplifier/mod.rs
Modifications:
-
Add new module import:
pub mod is_null;
-
Update
f_upmethod to chain the new simplification:fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> { let rewritten = simplify_not_expr(&node, self.schema)? .transform_data(|node| is_null::simplify_is_null_expr(&node))? .transform_data(|node| { unwrap_cast::unwrap_cast_in_comparison(node, self.schema) })?; Ok(rewritten) }
New File: datafusion/physical-expr/src/simplifier/is_null.rs
Create a new module following the pattern of not.rs:
//! Simplify IS NULL and IS NOT NULL expressions on literals
//!
//! - IS NULL(literal) -> true/false based on literal.value().is_null()
//! - IS NOT NULL(literal) -> true/false based on !literal.value().is_null()
use std::sync::Arc;
use datafusion_common::{tree_node::Transformed, Result, ScalarValue};
use crate::expressions::{lit, IsNullExpr, IsNotNullExpr, Literal};
use crate::PhysicalExpr;
pub fn simplify_is_null_expr(
expr: &Arc<dyn PhysicalExpr>,
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
// Handle IS NULL(literal)
if let Some(is_null_expr) = expr.as_any().downcast_ref::<IsNullExpr>() {
if let Some(literal) = is_null_expr.arg().as_any().downcast_ref::<Literal>() {
let result = literal.value().is_null();
return Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(result)))));
}
}
// Handle IS NOT NULL(literal)
if let Some(is_not_null_expr) = expr.as_any().downcast_ref::<IsNotNullExpr>() {
if let Some(literal) = is_not_null_expr.arg().as_any().downcast_ref::<Literal>() {
let result = !literal.value().is_null();
return Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(result)))));
}
}
Ok(Transformed::no(Arc::clone(expr)))
}Tests to Add (in mod.rs)
#[test]
fn test_simplify_is_null_literal() -> Result<()> {
let schema = test_schema();
let mut simplifier = PhysicalExprSimplifier::new(&schema);
// IS NULL(1) -> false
let is_null_expr = Arc::new(IsNullExpr::new(lit(ScalarValue::Int32(Some(1)))));
let result = simplifier.simplify(is_null_expr)?;
assert_eq!(as_literal(&result).value(), &ScalarValue::Boolean(Some(false)));
// IS NULL(NULL) -> true
let is_null_expr = Arc::new(IsNullExpr::new(lit(ScalarValue::Int32(None))));
let result = simplifier.simplify(is_null_expr)?;
assert_eq!(as_literal(&result).value(), &ScalarValue::Boolean(Some(true)));
Ok(())
}
#[test]
fn test_simplify_is_not_null_literal() -> Result<()> {
let schema = test_schema();
let mut simplifier = PhysicalExprSimplifier::new(&schema);
// IS NOT NULL(1) -> true
let expr = Arc::new(IsNotNullExpr::new(lit(ScalarValue::Int32(Some(1)))));
let result = simplifier.simplify(expr)?;
assert_eq!(as_literal(&result).value(), &ScalarValue::Boolean(Some(true)));
// IS NOT NULL(NULL) -> false
let expr = Arc::new(IsNotNullExpr::new(lit(ScalarValue::Int32(None))));
let result = simplifier.simplify(expr)?;
assert_eq!(as_literal(&result).value(), &ScalarValue::Boolean(Some(false)));
Ok(())
}Change 2: Handle Literal Arguments in build_predicate_expression
File: datafusion/pruning/src/pruning_predicate.rs
Modify build_predicate_expression function (~line 1415-1427):
Before:
if let Some(is_null) = expr_any.downcast_ref::<phys_expr::IsNullExpr>() {
return build_is_null_column_expr(is_null.arg(), schema, required_columns, false)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}
if let Some(is_not_null) = expr_any.downcast_ref::<phys_expr::IsNotNullExpr>() {
return build_is_null_column_expr(
is_not_null.arg(),
schema,
required_columns,
true,
)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}After:
if let Some(is_null) = expr_any.downcast_ref::<phys_expr::IsNullExpr>() {
// If argument is a literal, evaluate directly
if let Some(literal) = is_null.arg().as_any().downcast_ref::<phys_expr::Literal>() {
let result = literal.value().is_null();
return Arc::new(phys_expr::Literal::new(ScalarValue::Boolean(Some(result))));
}
return build_is_null_column_expr(is_null.arg(), schema, required_columns, false)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}
if let Some(is_not_null) = expr_any.downcast_ref::<phys_expr::IsNotNullExpr>() {
// If argument is a literal, evaluate directly
if let Some(literal) = is_not_null.arg().as_any().downcast_ref::<phys_expr::Literal>() {
let result = !literal.value().is_null();
return Arc::new(phys_expr::Literal::new(ScalarValue::Boolean(Some(result))));
}
return build_is_null_column_expr(
is_not_null.arg(),
schema,
required_columns,
true,
)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}Files to Modify
datafusion/physical-expr/src/simplifier/mod.rs- Add is_null module import and chain simplificationdatafusion/physical-expr/src/simplifier/is_null.rs- New file for IS NULL/IS NOT NULL simplificationdatafusion/pruning/src/pruning_predicate.rs- Handle literal arguments in build_predicate_expression
Testing
-
Run the original failing test:
cargo test --package datafusion --test core_integration --all-features -- schema_adapter::schema_adapter_integration_tests::test_parquet_missing_column --exact --nocapture -
Run simplifier tests:
cargo test --package datafusion-physical-expr --lib simplifier -
Run pruning predicate tests:
cargo test --package datafusion-pruning --lib
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels