Skip to content

perf: optimize process_selectors and compute_formula_selector#556

Closed
ddsjoberg wants to merge 15 commits intomainfrom
perf/selector-optimization
Closed

perf: optimize process_selectors and compute_formula_selector#556
ddsjoberg wants to merge 15 commits intomainfrom
perf/selector-optimization

Conversation

@ddsjoberg
Copy link
Copy Markdown
Collaborator

What changes are proposed in this pull request?

  • Optimize process_selectors.data.frame() and compute_formula_selector() by adding a fast path for bare symbol quosures that match column names, bypassing tidyselect::eval_select(). Also replace rev() |> Negate(duplicated)() |> rev() with !duplicated(x, fromLast = TRUE) and merge the imap() + assign loop into a single pass.

  • Add a benchmark CI workflow (.github/workflows/benchmark.yaml + .github/scripts/benchmark.R) that runs on perf-prefixed PRs and posts a comparison table as a PR comment.

Benchmark results (R-devel, local)

Function Before After Speedup
process_selectors (2 bare symbols) 860µs 58µs 14.7×
process_selectors (5 bare symbols) 1.9ms 70µs 27×
compute_formula_selector (2 formulas, symbol LHS) 998µs 138µs 7.2×
compute_formula_selector (5 formulas, symbol LHS) 2.18ms 266µs 8.2×
process_selectors (tidyselect helpers) 860µs 952µs no regression
compute_formula_selector (named list) 56µs 51µs no regression

The fast path only activates for bare symbol expressions (e.g., AGE, ARM) that match a column name. All other tidyselect expressions (starts_with(), where(), everything(), c(), -, etc.) fall through to tidyselect::eval_select() unchanged. No changes to public API or error behavior.

Reference GitHub issue associated with pull request. N/A


Pre-review Checklist (if item does not apply, mark is as complete)

  • All GitHub Action workflows pass with a ✅
  • PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main()
  • If a bug was fixed, a unit test was added.
  • Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage()
  • Request a reviewer

Reviewer Checklist (if item does not apply, mark is as complete)

  • If a bug was fixed, a unit test was added.
  • Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

  • Update NEWS.md with the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • All GitHub Action workflows pass with a ✅
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge" or "Rebase and merge"."

ddsjoberg and others added 2 commits May 4, 2026 23:04
Add fast path for bare symbol quosures in process_selectors.data.frame()
and compute_formula_selector() that skips tidyselect::eval_select() when
the expression is a simple column name matching the data. Replace
rev()+Negate(duplicated) dedup with duplicated(fromLast=TRUE). Merge
imap+assign into a single loop in process_selectors.data.frame().

Add benchmark CI workflow (.github/workflows/benchmark.yaml) triggered
on perf-prefixed PRs, with a script that compares PR vs main timings.

Co-authored-by: Ona <no-reply@ona.com>
@ddsjoberg ddsjoberg marked this pull request as ready for review May 4, 2026 23:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

badge

Code Coverage Summary

Filename                       Stmts    Miss  Cover    Missing
---------------------------  -------  ------  -------  ----------------------------------------------
R/add_calculated_row.R            53       6  88.68%   51-56
R/apply_fmt_fun.R                116       0  100.00%
R/ard_attributes.R                46       1  97.83%   57
R/ard_formals.R                   13       0  100.00%
R/ard_hierarchical.R              94      12  87.23%   88-93, 179-184
R/ard_identity.R                  13       0  100.00%
R/ard_missing.R                   65       7  89.23%   45-50, 61
R/ard_mvsummary.R                 43       7  83.72%   76-81, 94
R/ard_pairwise.R                  46       0  100.00%
R/ard_stack_hierarchical.R       274       0  100.00%
R/ard_stack.R                     97       0  100.00%
R/ard_strata.R                    33       0  100.00%
R/ard_summary.R                  194       8  95.88%   87-92, 218-219
R/ard_tabulate_rows.R             12       0  100.00%
R/ard_tabulate_value.R            72       7  90.28%   51-56, 73
R/ard_tabulate.R                 417      16  96.16%   108-113, 255, 478-482, 489-490, 667, 700
R/ard_total_n.R                   14       0  100.00%
R/as_card_fn.R                     8       0  100.00%
R/as_card.R                       12       0  100.00%
R/as_nested_list.R                41       0  100.00%
R/bind_ard.R                      47      11  76.60%   74-85
R/cards-package.R                  1       1  0.00%    14
R/check_ard_structure.R           55       5  90.91%   35-39
R/compare_ard_helpers.R          126      32  74.60%   44-50, 66-69, 227, 230, 277, 282, 287-313, 321
R/compare_ard.R                   26       0  100.00%
R/default_stat_labels.R           20       0  100.00%
R/deprecated.R                    34      30  11.76%   38-39, 54-71, 86-132
R/eval_capture_conditions.R       30       0  100.00%
R/filter_ard_hierarchical.R      208       0  100.00%
R/get_ard_statistics.R            16       0  100.00%
R/maximum_variable_value.R        15       0  100.00%
R/mock.R                         137       2  98.54%   116, 245
R/nest_for_ard.R                  74       1  98.65%   64
R/print_ard_conditions.R          83       0  100.00%
R/print.R                        101       2  98.02%   164-165
R/process_selectors.R            126       1  99.21%   337
R/rename_ard_columns.R            81       0  100.00%
R/rename_ard_groups.R             60       0  100.00%
R/replace_null_statistic.R        11       0  100.00%
R/round5.R                         1       0  100.00%
R/selectors.R                     23       0  100.00%
R/shuffle_ard.R                  167       0  100.00%
R/sort_ard_hierarchical.R        163       0  100.00%
R/summary_functions.R             25       1  96.00%   59
R/tidy_ard_order.R                34       0  100.00%
R/tidy_as_ard.R                   40       0  100.00%
R/unlist_ard_columns.R            27       0  100.00%
R/update_ard.R                    60       6  90.00%   54-59
R/utils.R                         30       0  100.00%
TOTAL                           3484     156  95.52%

Diff against main

Filename      Stmts    Miss  Cover
----------  -------  ------  --------
TOTAL             0       0  +100.00%

Results for commit: b9098cd

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Unit Tests Summary

  1 files  255 suites   1m 0s ⏱️
249 tests 163 ✅  86 💤 0 ❌
570 runs  452 ✅ 118 💤 0 ❌

Results for commit b9098cd.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Unit Test Performance Difference

Test Suite $Status$ Time on main $±Time$ $±Tests$ $±Skipped$ $±Failures$ $±Errors$
filter_ard_hierarchical 👶 $+0.01$ $+9$ $0$ $0$ $0$
sort_ard_hierarchical 👶 $+0.00$ $+3$ $0$ $0$ $0$
Additional test case details
Test Suite $Status$ Time on main $±Time$ Test Case
add_calculated_row 💔 $0.04$ $+1.18$ add_calculated_row_x_
apply_fmt_fun 💚 $1.29$ $-1.24$ apply_fmt_fun_works
eval_capture_conditions 💀 $0.05$ $-0.05$ unnamed
filter_ard_hierarchical 👶 $+0.01$ filter_ard_hierarchical_error_messaging_works
filter_ard_hierarchical 👶 $+0.01$ filter_ard_hierarchical_keep_empty_works
filter_ard_hierarchical 👶 $+0.00$ filter_ard_hierarchical_returns_only_summary_rows_when_all_rows_filtered_out
filter_ard_hierarchical 👶 $+0.01$ filter_ard_hierarchical_var_works
filter_ard_hierarchical 👶 $+10.59$ filter_ard_hierarchical_works
filter_ard_hierarchical 👶 $+0.00$ filter_ard_hierarchical_works_when_some_variables_not_included_in_x
filter_ard_hierarchical 👶 $+0.00$ filter_ard_hierarchical_works_with_ard_stack_hierarchical_count_results
filter_ard_hierarchical 👶 $+0.02$ filter_ard_hierarchical_works_with_column_specific_filters
filter_ard_hierarchical 👶 $+0.01$ filter_ard_hierarchical_works_with_non_standard_filters
filter_ard_hierarchical 👶 $+0.00$ filter_ard_hierarchical_works_with_only_one_variable_in_x
filter_ard_hierarchical 👶 $+0.00$ filter_ard_hierarchical_works_with_overall_data
shuffle_ard 💀 $0.04$ $-0.04$ unnamed
shuffle_ard 💚 $6.42$ $-6.39$ shuffle_trim_works
sort_ard_hierarchical 👶 $+0.00$ sort_ard_hierarchical_error_messaging_works
sort_ard_hierarchical 👶 $+0.00$ sort_ard_hierarchical_sort_alphanumeric_works
sort_ard_hierarchical 👶 $+0.01$ sort_ard_hierarchical_sort_descending_works
sort_ard_hierarchical 👶 $+0.00$ sort_ard_hierarchical_sort_works_with_different_sorting_methods_for_each_variable
sort_ard_hierarchical 👶 $+5.21$ sort_ard_hierarchical_works
sort_ard_hierarchical 👶 $+0.00$ sort_ard_hierarchical_works_when_some_variables_not_included_in_x
sort_ard_hierarchical 👶 $+0.00$ sort_ard_hierarchical_works_when_sorting_using_p_instead_of_n
sort_ard_hierarchical 👶 $+0.00$ sort_ard_hierarchical_works_when_there_is_no_overall_row_in_x
sort_ard_hierarchical 👶 $+0.01$ sort_ard_hierarchical_works_with_only_one_variable_in_x
sort_ard_hierarchical 👶 $+0.01$ sort_ard_hierarchical_works_with_overall_data

Results for commit eb04c24

♻️ This comment has been updated with latest results.

ddsjoberg and others added 4 commits May 4, 2026 23:26
Co-authored-by: Ona <no-reply@ona.com>
Use only R release and older versions. The reusable workflows in
check.yaml already run against ghcr.io/insightsengineering/rstudio:latest
which ships R release (4.5.2).

Co-authored-by: Ona <no-reply@ona.com>
The S3 method is registered but not directly exported, so
cards::process_selectors.data.frame fails. Use getFromNamespace()
to retrieve it from the namespace instead.

Co-authored-by: Ona <no-reply@ona.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Performance Benchmark

Comparing main (0.7.1.9013) vs PR (0.7.1.9013)

Each benchmark runs 5 independent rounds. The change column shows the mean % difference (negative = faster).
The 95% CI column shows the confidence interval on the change. If the CI excludes 0%, the result is flagged as a real improvement (✅) or regression (❌).

ard_summary()

expression main pr change ci
ard_summary 56.98ms 56.65ms ➖ -0.6% [-1.7%, 0.5%]

ddsjoberg and others added 9 commits May 4, 2026 16:56
Replace the bare-symbol fast path with one that handles character
vectors: single string literals ("AGE") and c() calls with all
string arguments (c("AGE", "ARM")). Applies to both
process_selectors.data.frame() and compute_formula_selector() LHS.

Falls through to tidyselect when any name is not found in the data,
preserving existing error behavior.

Co-authored-by: Ona <no-reply@ona.com>
Evaluate the quosure and check if the result is a character vector
with all names present in the data. This covers variable references
like `cols <- c("AGE", "ARM"); process_selectors(data, variables = cols)`
in addition to string literals.

Co-authored-by: Ona <no-reply@ona.com>
Only attempt evaluation for expressions that are likely character
vectors (string literals, c() calls, plain symbols). Tidyselect
helpers like starts_with() skip straight to cards_select() with
no overhead.

Co-authored-by: Ona <no-reply@ona.com>
Restore process_selectors.data.frame and compute_formula_selector
to their original main implementations. All selector evaluation
goes through tidyselect unconditionally.

Co-authored-by: Ona <no-reply@ona.com>
Co-authored-by: Ona <no-reply@ona.com>
Co-authored-by: Ona <no-reply@ona.com>
@ddsjoberg ddsjoberg closed this May 5, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators May 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant