Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -201,5 +201,5 @@ S3method(format_col, expression)
export(format_list_item)
S3method(format_list_item, default)

export(fdroplevels)
export(fdroplevels, setdroplevels)
S3method(droplevels, data.table)
8 changes: 6 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

# data.table [v1.15.99](https://github.com/Rdatatable/data.table/milestone/30) (in development)

## BREAKING CHANGE
## BREAKING CHANGES

1. `droplevels(in.place=TRUE)` is deprecated in favor of calling `setdroplevels()`, [#6014](https://github.com/Rdatatable/data.table/issues/6014). Given the associated risks/pain points, we strongly prefer all in-place/by-reference behavior within data.table come from functions `set*` (and `:=`) to make it as clear as possible that inputs are mutable. See below and `?setdroplevels` for more.

1. `` `[.data.table` `` is un-exported again. This was exported to support an experimental feature (`DT()` functional form of `[`) that never made it to release, but we forgot to claw back this export in the NAMESPACE; sorry about that. We didn't find anyone calling the method directly (which is inadvisable to begin with).
2. `` `[.data.table` `` is un-exported again. This was exported to support an experimental feature (`DT()` functional form of `[`) that never made it to release, but we forgot to claw back this export in the NAMESPACE; sorry about that. We didn't find anyone calling the method directly (which is inadvisable to begin with).

## NEW FEATURES

Expand Down Expand Up @@ -52,6 +54,8 @@

17. `[.data.table` gains `showProgress`, allowing users to toggle progress printing for large "by" operations, [#3060](https://github.com/Rdatatable/data.table/issues/3060). Reports information such as number of groups processed, total groups, total time elapsed and estimated time until completion. This feature doesn't apply for `GForce` optimized operations. Thanks to @eatonya, @zachmayer for filing FRs, and to everyone else that up-voted/chimed in on the issue. Thanks to @joshhwuu for the PR.

18. New `setdroplevels()` as a by-reference version of the `droplevels()` method, which returns a copy of its input, [#6014](https://github.com/Rdatatable/data.table/issues/6014). Thanks @MichaelChirico for the suggestion and implementation.

## BUG FIXES

1. `unique()` returns a copy the case when `nrows(x) <= 1` instead of a mutable alias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.
Expand Down
35 changes: 20 additions & 15 deletions R/fdroplevels.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,24 @@ fdroplevels = function(x, exclude = if (anyNA(levels(x))) NULL else NA, ...) {
return(ans)
}

droplevels.data.table = function(x, except = NULL, exclude, in.place = FALSE, ...){
stopifnot(is.logical(in.place))
if (nrow(x)==0L) return(x)
ix = vapply(x, is.factor, NA)
if(!is.null(except)){
stopifnot(is.numeric(except), except <= length(x))
ix[except] = FALSE
}
if(!sum(ix)) return(x)
if(!in.place) x = copy(x)
for(nx in names(ix)[ix==TRUE]){
if (missing(exclude)) set(x, i = NULL, j = nx, value = fdroplevels(x[[nx]]))
else set(x, i = NULL, j = nx, value = fdroplevels(x[[nx]], exclude = exclude))
}
return(x)
droplevels.data.table = function(x, except=NULL, exclude, in.place=FALSE, ...){
stopifnot(is.logical(in.place))
if (isTRUE(in.place)) warningf("droplevels() with in.place=TRUE is deprecated. Use setdroplevels() instead.")
if (!in.place) x = copy(x)
if (missing(exclude)) exclude = NULL
setdroplevels(x, except, exclude)[]
}

setdroplevels = function(x, except=NULL, exclude=NULL) {
if (!nrow(x)) return(invisible(x))
ix = vapply_1b(x, is.factor)
if (!is.null(except)) {
stopifnot(is.numeric(except), except >= 1, except <= length(x))
ix[except] = FALSE
}
if (!any(ix)) return(invisible(x))
for (nx in names(ix)[ix]) {
set(x, i=NULL, j=nx, value=fdroplevels(x[[nx]], exclude=exclude))
}
invisible(x)
}
11 changes: 9 additions & 2 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -1682,7 +1682,7 @@ test(529.4, set(DT1, i=NULL, j=7L, value=5L), error="Item 1 of column numbers in

# Test that data.frame incompability is fixed, came to light in Feb 2012
DT = data.table(name=c('a','b','c'), value=1:3)
test(530, base::droplevels(DT[ name != 'a' ]), data.table(name=c('b','c'),value=2:3)) # base:: because we'll implement a fast droplevels, too.
test(530, droplevels(DT[ name != 'a' ]), data.table(name=c('b','c'),value=2:3))

# Test that .set_row_names() is maintained on .SD for each group
DT = data.table(a=INT(1,1,2,2,2,3,3,3,3),b=1:9)
Expand Down Expand Up @@ -17732,14 +17732,21 @@ if (base::getRversion() >= "3.4.0") {
}
test(2214.06, droplevels(DT)[["a"]], droplevels(DT[1:5,a]))
test(2214.07, droplevels(DT, 1)[["a"]], x[1:5])
test(2214.08, droplevels(DT, in.place=TRUE), DT)
test(2214.08, droplevels(DT, in.place=TRUE), DT, warning="droplevels() with in.place=TRUE is deprecated.")
# support ordered factors in fdroplevels
o = factor(letters[1:10], ordered=TRUE)
test(2214.09, fdroplevels(o[1:5]), droplevels(o[1:5]))
# edge case for empty table #5184
test(2214.10, droplevels(DT[0]), DT[0])
test(2214.11, droplevels(data.table()), data.table())

# setdroplevels() for in-place operations #6014
x = factor(letters[1:10])
DT = data.table(a = x)[1:5]
test(2214.12, setdroplevels(DT, except=1L), DT) # don't do anything
test(2214.13, setdroplevels(DT, except=0L), error="except >= 1")
test(2214.14, setdroplevels(DT, except=2L), error="except <= length(x)")
test(2214.15, setdroplevels(DT), DT)

# factor i should be just like character i and work, #1632
DT = data.table(A=letters[1:3], B=4:6, key="A")
Expand Down
2 changes: 2 additions & 0 deletions man/fdroplevels.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
\alias{fdroplevels}
\alias{droplevels}
\alias{droplevels.data.table}
\alias{setdroplevels}
\title{Fast droplevels}
\description{
Similar to \code{base::droplevels} but \emph{much faster}.
}

\usage{
fdroplevels(x, exclude = if (anyNA(levels(x))) NULL else NA, \dots)
setdroplevels(x, except = NULL, exclude = NULL)

\method{droplevels}{data.table}(x, except = NULL, exclude, in.place = FALSE, \dots)
}
Expand Down