perf: optimize process_selectors and compute_formula_selector#556
Closed
perf: optimize process_selectors and compute_formula_selector#556
Conversation
Add fast path for bare symbol quosures in process_selectors.data.frame() and compute_formula_selector() that skips tidyselect::eval_select() when the expression is a simple column name matching the data. Replace rev()+Negate(duplicated) dedup with duplicated(fromLast=TRUE). Merge imap+assign into a single loop in process_selectors.data.frame(). Add benchmark CI workflow (.github/workflows/benchmark.yaml) triggered on perf-prefixed PRs, with a script that compares PR vs main timings. Co-authored-by: Ona <no-reply@ona.com>
Contributor
Code Coverage SummaryDiff against mainResults for commit: b9098cd Minimum allowed coverage is ♻️ This comment has been updated with latest results |
Contributor
Unit Tests Summary 1 files 255 suites 1m 0s ⏱️ Results for commit b9098cd. ♻️ This comment has been updated with latest results. |
Contributor
Unit Test Performance Difference
Additional test case details
Results for commit eb04c24 ♻️ This comment has been updated with latest results. |
Co-authored-by: Ona <no-reply@ona.com>
Use only R release and older versions. The reusable workflows in check.yaml already run against ghcr.io/insightsengineering/rstudio:latest which ships R release (4.5.2). Co-authored-by: Ona <no-reply@ona.com>
The S3 method is registered but not directly exported, so cards::process_selectors.data.frame fails. Use getFromNamespace() to retrieve it from the namespace instead. Co-authored-by: Ona <no-reply@ona.com>
Contributor
Performance BenchmarkComparing main ( Each benchmark runs 5 independent rounds. The change column shows the mean % difference (negative = faster). ard_summary()
|
Replace the bare-symbol fast path with one that handles character
vectors: single string literals ("AGE") and c() calls with all
string arguments (c("AGE", "ARM")). Applies to both
process_selectors.data.frame() and compute_formula_selector() LHS.
Falls through to tidyselect when any name is not found in the data,
preserving existing error behavior.
Co-authored-by: Ona <no-reply@ona.com>
Evaluate the quosure and check if the result is a character vector
with all names present in the data. This covers variable references
like `cols <- c("AGE", "ARM"); process_selectors(data, variables = cols)`
in addition to string literals.
Co-authored-by: Ona <no-reply@ona.com>
…htsengineering/cards into perf/selector-optimization
Only attempt evaluation for expressions that are likely character vectors (string literals, c() calls, plain symbols). Tidyselect helpers like starts_with() skip straight to cards_select() with no overhead. Co-authored-by: Ona <no-reply@ona.com>
Restore process_selectors.data.frame and compute_formula_selector to their original main implementations. All selector evaluation goes through tidyselect unconditionally. Co-authored-by: Ona <no-reply@ona.com>
Co-authored-by: Ona <no-reply@ona.com>
Co-authored-by: Ona <no-reply@ona.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Optimize
process_selectors.data.frame()andcompute_formula_selector()by adding a fast path for bare symbol quosures that match column names, bypassingtidyselect::eval_select(). Also replacerev() |> Negate(duplicated)() |> rev()with!duplicated(x, fromLast = TRUE)and merge theimap()+assignloop into a single pass.Add a benchmark CI workflow (
.github/workflows/benchmark.yaml+.github/scripts/benchmark.R) that runs onperf-prefixed PRs and posts a comparison table as a PR comment.Benchmark results (R-devel, local)
process_selectors(2 bare symbols)process_selectors(5 bare symbols)compute_formula_selector(2 formulas, symbol LHS)compute_formula_selector(5 formulas, symbol LHS)process_selectors(tidyselect helpers)compute_formula_selector(named list)The fast path only activates for bare symbol expressions (e.g.,
AGE,ARM) that match a column name. All other tidyselect expressions (starts_with(),where(),everything(),c(),-, etc.) fall through totidyselect::eval_select()unchanged. No changes to public API or error behavior.Reference GitHub issue associated with pull request. N/A
Pre-review Checklist (if item does not apply, mark is as complete)
usethis::pr_merge_main()devtools::test_coverage()Reviewer Checklist (if item does not apply, mark is as complete)
pkgdown::build_site(). Check the R console for errors, and review the rendered website.devtools::test_coverage()When the branch is ready to be merged:
NEWS.mdwith the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.mdfor examples).