refactor: Optimize required_columns from BTreeSet to Vec in struct PushdownChecker#19678
Conversation
|
cc @kosiew |
9e5d7ba to
8c2462e
Compare
kosiew
left a comment
There was a problem hiding this comment.
Thanks @kumarUjjawal for contributing.
I left some comments for your consideration.
| if !self.required_columns.contains(&idx) { | ||
| self.required_columns.push(idx); | ||
| } |
There was a problem hiding this comment.
I think it would help future contributors to
- Add comment explaining linear search is acceptable for small
n- OR switch to
HashSetfor O(1) deduplication ifnmight grow
- OR switch to
| let prevents_pushdown = checker.prevents_pushdown(); | ||
| let nested = checker.nested_behavior; | ||
| let mut required_columns = checker.required_columns; | ||
| required_columns.sort_unstable(); |
There was a problem hiding this comment.
how about adding:
impl PushdownChecker {
fn into_sorted_columns(mut self) -> PushdownColumns {
self.required_columns.sort_unstable();
self.required_columns.dedup(); // this removes the need for contains check
PushdownColumns {
required_columns: self.required_columns,
nested: self.nested_behavior,
}
}| Ok((!prevents_pushdown).then_some(PushdownColumns { | ||
| required_columns, | ||
| nested, | ||
| })) |
There was a problem hiding this comment.
and then rewriting this to:
Ok((!checker.prevents_pushdown())
.then_some(checker.into_sorted_columns()))| if !self.required_columns.contains(&idx) { | ||
| self.required_columns.push(idx); | ||
| } |
There was a problem hiding this comment.
see suggested
fn into_sorted_columns
below
Thanks for the feedback. Incorporated the changes. |
alamb
left a comment
There was a problem hiding this comment.
Thanks @kumarUjjawal and @kosiew -- this is really nice
| projected_columns: bool, | ||
| /// Indices into the file schema of columns required to evaluate the expression. | ||
| required_columns: BTreeSet<usize>, | ||
| required_columns: Vec<usize>, |
| #[derive(Debug)] | ||
| struct PushdownColumns { | ||
| required_columns: BTreeSet<usize>, | ||
| required_columns: Vec<usize>, |
There was a problem hiding this comment.
maybe it is worth a comment here explaining the assumption that required_columns are sorted and unique (non duplicate)
Which issue does this PR close?
required_columnsfromBTreeSet<usize>toVec<usize>in struct PushdownChecker<'schema> #19673.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?