Skip to content

Commit 1822d35

Browse files
add formula support for split (#5393)
* add formula support for split * change terms dispatch * fix grammer * add info about upstream improvement * move NEWS * grammar * cite PR author for already-merged PR * Cite the base function we want to match * Update man/split.Rd Co-authored-by: Michael Chirico <michaelchirico4@gmail.com> * add suggestions to tests * remove new example --------- Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
1 parent 07fd933 commit 1822d35

File tree

4 files changed

+17
-2
lines changed

4 files changed

+17
-2
lines changed

NEWS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414
# 2:
1515
```
1616

17-
2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654).
17+
2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654). Thanks @dvg-p4 for the request and PR.
18+
19+
3. `split.data.table` also accepts a formula for `f`, [#5392](https://github.com/Rdatatable/data.table/issues/5392), mirroring the same in `base::split.data.frame` since R 4.1.0 (May 2021). Thanks to @XiangyunHuang for the request, and @ben-schwen for the PR.
1820

1921
3. Namespace-qualifying `data.table::shift()`, `data.table::first()`, or `data.table::last()` will not deactivate GForce, [#5942](https://github.com/Rdatatable/data.table/issues/5942). Thanks @MichaelChirico for the proposal and fix. Namespace-qualifying other calls like `stats::sum()`, `base::prod()`, etc., continue to work as an escape valve to avoid GForce, e.g. to ensure S3 method dispatch.
2022

R/data.table.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2401,6 +2401,9 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
24012401
if (!missing(by))
24022402
stopf("passing 'f' argument together with 'by' is not allowed, use 'by' when split by column in data.table and 'f' when split by external factor")
24032403
# same as split.data.frame - handling all exceptions, factor orders etc, in a single stream of processing was a nightmare in factor and drop consistency
2404+
# evaluate formula mirroring split.data.frame #5392. Mimics base::.formula2varlist.
2405+
if (inherits(f, "formula"))
2406+
f <- eval(attr(terms(f), "variables"), x, environment(f))
24042407
# be sure to use x[ind, , drop = FALSE], not x[ind], in case downstream methods don't follow the same subsetting semantics (#5365)
24052408
return(lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), function(ind) x[ind, , drop = FALSE]))
24062409
}

inst/tests/tests.Rraw

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18294,3 +18294,13 @@ test(2246.1, DT[, data.table::shift(b), by=a], DT[, shift(b), by=a], output="GFo
1829418294
test(2246.2, DT[, data.table::first(b), by=a], DT[, first(b), by=a], output="GForce TRUE")
1829518295
test(2246.3, DT[, data.table::last(b), by=a], DT[, last(b), by=a], output="GForce TRUE")
1829618296
options(old)
18297+
18298+
# 5392 split(x,f) works with formula f
18299+
dt = data.table(x=1:4, y=factor(letters[1:2]))
18300+
test(2247.1, split(dt, ~y), split(dt, dt$y))
18301+
dt = data.table(x=1:4, y=1:2)
18302+
test(2247.2, split(dt, ~y), list(`1`=data.table(x=c(1L,3L), y=1L), `2`=data.table(x=c(2L, 4L), y=2L)))
18303+
# Closely match the original MRE from the issue
18304+
test(2247.3, do.call(rbind, split(dt, ~y)), setDT(do.call(rbind, split(as.data.frame(dt), ~y))))
18305+
dt = data.table(x=1:4, y=factor(letters[1:2]), z=factor(c(1,1,2,2), labels=c("c", "d")))
18306+
test(2247.4, split(dt, ~y+z), list("a.c"=dt[1], "b.c"=dt[2], "a.d"=dt[3], "b.d"=dt[4]))

man/split.Rd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
}
1313
\arguments{
1414
\item{x}{data.table }
15-
\item{f}{factor or list of factors. Same as \code{\link[base:split]{split.data.frame}}. Use \code{by} argument instead, this is just for consistency with data.frame method.}
15+
\item{f}{Same as \code{\link[base:split]{split.data.frame}}. Use \code{by} argument instead, this is just for consistency with data.frame method.}
1616
\item{drop}{logical. Default \code{FALSE} will not drop empty list elements caused by factor levels not referred by that factors. Works also with new arguments of split data.table method.}
1717
\item{by}{character vector. Column names on which split should be made. For \code{length(by) > 1L} and \code{flatten} FALSE it will result nested lists with data.tables on leafs.}
1818
\item{sorted}{When default \code{FALSE} it will retain the order of groups we are splitting on. When \code{TRUE} then sorted list(s) are returned. Does not have effect for \code{f} argument.}

0 commit comments

Comments
 (0)