perf: optimize process_selectors and compute_formula_selector by ddsjoberg · Pull Request #556 · insightsengineering/cards

ddsjoberg · 2026-05-04T23:05:03Z

What changes are proposed in this pull request?

Optimize process_selectors.data.frame() and compute_formula_selector() by adding a fast path for bare symbol quosures that match column names, bypassing tidyselect::eval_select(). Also replace rev() |> Negate(duplicated)() |> rev() with !duplicated(x, fromLast = TRUE) and merge the imap() + assign loop into a single pass.
Add a benchmark CI workflow (.github/workflows/benchmark.yaml + .github/scripts/benchmark.R) that runs on perf-prefixed PRs and posts a comparison table as a PR comment.

Benchmark results (R-devel, local)

Function	Before	After	Speedup
`process_selectors` (2 bare symbols)	860µs	58µs	14.7×
`process_selectors` (5 bare symbols)	1.9ms	70µs	27×
`compute_formula_selector` (2 formulas, symbol LHS)	998µs	138µs	7.2×
`compute_formula_selector` (5 formulas, symbol LHS)	2.18ms	266µs	8.2×
`process_selectors` (tidyselect helpers)	860µs	952µs	no regression
`compute_formula_selector` (named list)	56µs	51µs	no regression

The fast path only activates for bare symbol expressions (e.g., AGE, ARM) that match a column name. All other tidyselect expressions (starts_with(), where(), everything(), c(), -, etc.) fall through to tidyselect::eval_select() unchanged. No changes to public API or error behavior.

Reference GitHub issue associated with pull request. N/A

Pre-review Checklist (if item does not apply, mark is as complete)

All GitHub Action workflows pass with a ✅
PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main()
If a bug was fixed, a unit test was added.
Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage()
Request a reviewer

Reviewer Checklist (if item does not apply, mark is as complete)

If a bug was fixed, a unit test was added.
Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

Update NEWS.md with the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
All GitHub Action workflows pass with a ✅
Approve Pull Request
Merge the PR. Please use "Squash and merge" or "Rebase and merge"."

Add fast path for bare symbol quosures in process_selectors.data.frame() and compute_formula_selector() that skips tidyselect::eval_select() when the expression is a simple column name matching the data. Replace rev()+Negate(duplicated) dedup with duplicated(fromLast=TRUE). Merge imap+assign into a single loop in process_selectors.data.frame(). Add benchmark CI workflow (.github/workflows/benchmark.yaml) triggered on perf-prefixed PRs, with a script that compares PR vs main timings. Co-authored-by: Ona <no-reply@ona.com>

github-actions · 2026-05-04T23:20:16Z

Code Coverage Summary

Filename                       Stmts    Miss  Cover    Missing
---------------------------  -------  ------  -------  ----------------------------------------------
R/add_calculated_row.R            53       6  88.68%   51-56
R/apply_fmt_fun.R                116       0  100.00%
R/ard_attributes.R                46       1  97.83%   57
R/ard_formals.R                   13       0  100.00%
R/ard_hierarchical.R              94      12  87.23%   88-93, 179-184
R/ard_identity.R                  13       0  100.00%
R/ard_missing.R                   65       7  89.23%   45-50, 61
R/ard_mvsummary.R                 43       7  83.72%   76-81, 94
R/ard_pairwise.R                  46       0  100.00%
R/ard_stack_hierarchical.R       274       0  100.00%
R/ard_stack.R                     97       0  100.00%
R/ard_strata.R                    33       0  100.00%
R/ard_summary.R                  194       8  95.88%   87-92, 218-219
R/ard_tabulate_rows.R             12       0  100.00%
R/ard_tabulate_value.R            72       7  90.28%   51-56, 73
R/ard_tabulate.R                 417      16  96.16%   108-113, 255, 478-482, 489-490, 667, 700
R/ard_total_n.R                   14       0  100.00%
R/as_card_fn.R                     8       0  100.00%
R/as_card.R                       12       0  100.00%
R/as_nested_list.R                41       0  100.00%
R/bind_ard.R                      47      11  76.60%   74-85
R/cards-package.R                  1       1  0.00%    14
R/check_ard_structure.R           55       5  90.91%   35-39
R/compare_ard_helpers.R          126      32  74.60%   44-50, 66-69, 227, 230, 277, 282, 287-313, 321
R/compare_ard.R                   26       0  100.00%
R/default_stat_labels.R           20       0  100.00%
R/deprecated.R                    34      30  11.76%   38-39, 54-71, 86-132
R/eval_capture_conditions.R       30       0  100.00%
R/filter_ard_hierarchical.R      208       0  100.00%
R/get_ard_statistics.R            16       0  100.00%
R/maximum_variable_value.R        15       0  100.00%
R/mock.R                         137       2  98.54%   116, 245
R/nest_for_ard.R                  74       1  98.65%   64
R/print_ard_conditions.R          83       0  100.00%
R/print.R                        101       2  98.02%   164-165
R/process_selectors.R            126       1  99.21%   337
R/rename_ard_columns.R            81       0  100.00%
R/rename_ard_groups.R             60       0  100.00%
R/replace_null_statistic.R        11       0  100.00%
R/round5.R                         1       0  100.00%
R/selectors.R                     23       0  100.00%
R/shuffle_ard.R                  167       0  100.00%
R/sort_ard_hierarchical.R        163       0  100.00%
R/summary_functions.R             25       1  96.00%   59
R/tidy_ard_order.R                34       0  100.00%
R/tidy_as_ard.R                   40       0  100.00%
R/unlist_ard_columns.R            27       0  100.00%
R/update_ard.R                    60       6  90.00%   54-59
R/utils.R                         30       0  100.00%
TOTAL                           3484     156  95.52%

Diff against main

Filename      Stmts    Miss  Cover
----------  -------  ------  --------
TOTAL             0       0  +100.00%

Results for commit: b9098cd

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

github-actions · 2026-05-04T23:21:42Z

Unit Tests Summary

1 files 255 suites 1m 0s ⏱️
249 tests 163 ✅ 86 💤 0 ❌
570 runs 452 ✅ 118 💤 0 ❌

Results for commit b9098cd.

♻️ This comment has been updated with latest results.

github-actions · 2026-05-04T23:21:46Z

Unit Test Performance Difference

Test Suite	$Status$	Time on `main`	$±Time$	$±Tests$	$±Skipped$	$±Failures$	$±Errors$
filter_ard_hierarchical	👶		$+0.01$	$+9$	$0$	$0$	$0$
sort_ard_hierarchical	👶		$+0.00$	$+3$	$0$	$0$	$0$

Additional test case details

Test Suite	$Status$	Time on `main`	$±Time$	Test Case
add_calculated_row	💔	$0.04$	$+1.18$	add_calculated_row_x_
apply_fmt_fun	💚	$1.29$	$-1.24$	apply_fmt_fun_works
eval_capture_conditions	💀	$0.05$	$-0.05$	unnamed
filter_ard_hierarchical	👶		$+0.01$	filter_ard_hierarchical_error_messaging_works
filter_ard_hierarchical	👶		$+0.01$	filter_ard_hierarchical_keep_empty_works
filter_ard_hierarchical	👶		$+0.00$	filter_ard_hierarchical_returns_only_summary_rows_when_all_rows_filtered_out
filter_ard_hierarchical	👶		$+0.01$	filter_ard_hierarchical_var_works
filter_ard_hierarchical	👶		$+10.59$	filter_ard_hierarchical_works
filter_ard_hierarchical	👶		$+0.00$	filter_ard_hierarchical_works_when_some_variables_not_included_in_x
filter_ard_hierarchical	👶		$+0.00$	filter_ard_hierarchical_works_with_ard_stack_hierarchical_count_results
filter_ard_hierarchical	👶		$+0.02$	filter_ard_hierarchical_works_with_column_specific_filters
filter_ard_hierarchical	👶		$+0.01$	filter_ard_hierarchical_works_with_non_standard_filters
filter_ard_hierarchical	👶		$+0.00$	filter_ard_hierarchical_works_with_only_one_variable_in_x
filter_ard_hierarchical	👶		$+0.00$	filter_ard_hierarchical_works_with_overall_data
shuffle_ard	💀	$0.04$	$-0.04$	unnamed
shuffle_ard	💚	$6.42$	$-6.39$	shuffle_trim_works
sort_ard_hierarchical	👶		$+0.00$	sort_ard_hierarchical_error_messaging_works
sort_ard_hierarchical	👶		$+0.00$	sort_ard_hierarchical_sort_alphanumeric_works
sort_ard_hierarchical	👶		$+0.01$	sort_ard_hierarchical_sort_descending_works
sort_ard_hierarchical	👶		$+0.00$	sort_ard_hierarchical_sort_works_with_different_sorting_methods_for_each_variable
sort_ard_hierarchical	👶		$+5.21$	sort_ard_hierarchical_works
sort_ard_hierarchical	👶		$+0.00$	sort_ard_hierarchical_works_when_some_variables_not_included_in_x
sort_ard_hierarchical	👶		$+0.00$	sort_ard_hierarchical_works_when_sorting_using_p_instead_of_n
sort_ard_hierarchical	👶		$+0.00$	sort_ard_hierarchical_works_when_there_is_no_overall_row_in_x
sort_ard_hierarchical	👶		$+0.01$	sort_ard_hierarchical_works_with_only_one_variable_in_x
sort_ard_hierarchical	👶		$+0.01$	sort_ard_hierarchical_works_with_overall_data

Results for commit eb04c24

♻️ This comment has been updated with latest results.

Co-authored-by: Ona <no-reply@ona.com>

Use only R release and older versions. The reusable workflows in check.yaml already run against ghcr.io/insightsengineering/rstudio:latest which ships R release (4.5.2). Co-authored-by: Ona <no-reply@ona.com>

The S3 method is registered but not directly exported, so cards::process_selectors.data.frame fails. Use getFromNamespace() to retrieve it from the namespace instead. Co-authored-by: Ona <no-reply@ona.com>

github-actions · 2026-05-04T23:51:25Z

Performance Benchmark

Comparing main (0.7.1.9013) vs PR (0.7.1.9013)

Each benchmark runs 5 independent rounds. The change column shows the mean % difference (negative = faster).
The 95% CI column shows the confidence interval on the change. If the CI excludes 0%, the result is flagged as a real improvement (✅) or regression (❌).

ard_summary()

expression	main	pr	change	ci
ard_summary	56.98ms	56.65ms	➖ -0.6%	[-1.7%, 0.5%]

Replace the bare-symbol fast path with one that handles character vectors: single string literals ("AGE") and c() calls with all string arguments (c("AGE", "ARM")). Applies to both process_selectors.data.frame() and compute_formula_selector() LHS. Falls through to tidyselect when any name is not found in the data, preserving existing error behavior. Co-authored-by: Ona <no-reply@ona.com>

Evaluate the quosure and check if the result is a character vector with all names present in the data. This covers variable references like `cols <- c("AGE", "ARM"); process_selectors(data, variables = cols)` in addition to string literals. Co-authored-by: Ona <no-reply@ona.com>

…htsengineering/cards into perf/selector-optimization

Only attempt evaluation for expressions that are likely character vectors (string literals, c() calls, plain symbols). Tidyselect helpers like starts_with() skip straight to cards_select() with no overhead. Co-authored-by: Ona <no-reply@ona.com>

Restore process_selectors.data.frame and compute_formula_selector to their original main implementations. All selector evaluation goes through tidyselect unconditionally. Co-authored-by: Ona <no-reply@ona.com>

Co-authored-by: Ona <no-reply@ona.com>

ddsjoberg and others added 2 commits May 4, 2026 23:04

Merge branch 'main' into perf/selector-optimization

68347db

ddsjoberg marked this pull request as ready for review May 4, 2026 23:15

ddsjoberg and others added 4 commits May 4, 2026 23:26

style: apply styler formatting to benchmark.R

eb1dceb

Co-authored-by: Ona <no-reply@ona.com>

ci: remove R-devel from R-CMD-check matrix

950bc01

Use only R release and older versions. The reusable workflows in check.yaml already run against ghcr.io/insightsengineering/rstudio:latest which ships R release (4.5.2). Co-authored-by: Ona <no-reply@ona.com>

Merge branch 'main' into perf/selector-optimization

865f83d

fix: use getFromNamespace to access process_selectors.data.frame

044017c

The S3 method is registered but not directly exported, so cards::process_selectors.data.frame fails. Use getFromNamespace() to retrieve it from the namespace instead. Co-authored-by: Ona <no-reply@ona.com>

ddsjoberg and others added 9 commits May 4, 2026 16:56

Update R-CMD-check.yaml

1a1d85b

Update R-CMD-check.yaml

dede154

Merge branch 'perf/selector-optimization' of https://github.com/insig…

9e8153b

…htsengineering/cards into perf/selector-optimization

revert: remove all selector fast paths

b9098cd

Restore process_selectors.data.frame and compute_formula_selector to their original main implementations. All selector evaluation goes through tidyselect unconditionally. Co-authored-by: Ona <no-reply@ona.com>

bench: simplify to single ard_summary(data = ADSL) test

eda1b2e

Co-authored-by: Ona <no-reply@ona.com>

bench: use ard_summary(data = ADSL, variables = AGE)

b23b9a9

Co-authored-by: Ona <no-reply@ona.com>

ddsjoberg closed this May 5, 2026

github-actions Bot locked and limited conversation to collaborators May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize process_selectors and compute_formula_selector#556

perf: optimize process_selectors and compute_formula_selector#556
ddsjoberg wants to merge 15 commits intomainfrom
perf/selector-optimization

ddsjoberg commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ddsjoberg commented May 4, 2026

Benchmark results (R-devel, local)

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Summary

Diff against main

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Tests Summary

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Performance Difference

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmark

ard_summary()

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading