Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,8 @@

23. Added a `data.table` method for `utils::edit` to ensure a `data.table` is returned, for convenience, [#593](https://github.com/Rdatatable/data.table/issues/593).

24. More efficient optimization of many columns in `j` (e.g. from `.SD`), [#1470](https://github.com/Rdatatable/data.table/issues/1470). Thanks @Jorges1000 for the report.

#### BUG FIXES

1. `first`, `last`, `head` and `tail` by group no longer error in some cases, [#2030](https://github.com/Rdatatable/data.table/issues/2030) [#3462](https://github.com/Rdatatable/data.table/issues/3462). Thanks to @franknarf1 for reporting.
Expand Down
25 changes: 15 additions & 10 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1611,11 +1611,11 @@ replace_order = function(isub, verbose, env) {
}
if (verbose) {
if (!identical(oldjsub, jsub))
cat("lapply optimization changed j from '",deparse(oldjsub),"' to '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
cat("lapply optimization changed j from '",deparse(oldjsub),"' to '",deparse(jsub,width.cutoff=200L, nlines=1L),"'\n",sep="")
else
cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200L, nlines=1L),"'\n",sep="")
}
dotN = function(x) is.name(x) && x == ".N" # For #5760
dotN = function(x) is.name(x) && x==".N" # For #5760. TODO: Rprof() showed dotN() may be the culprit if iterated (#1470)?; avoid the == which converts each x to character?
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x == quote(.N) works, is it any faster?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not somewhat surprisingly (?):

> microbenchmark::microbenchmark(times = 1e5,
+                                quote(N) == quote(N),
+                                quote(N) == 'N')
Unit: nanoseconds
                 expr min  lq     mean median  uq     max neval
 quote(N) == quote(N) 190 226 309.0476    234 244 3892045 1e+05
      quote(N) == "N" 153 182 213.9202    192 198   39180 1e+05

Ditto if we store qN = quote(N) beforehand for the RHS. identical much worse.

# FR #971, GForce kicks in on all subsets, no joins yet. Although joins could work with
# nomatch=0L even now.. but not switching it on yet, will deal it separately.
if (getOption("datatable.optimize")>=2 && !is.data.table(i) && !byjoin && length(f__) && !length(lhs)) {
Expand Down Expand Up @@ -1643,7 +1643,9 @@ replace_order = function(isub, verbose, env) {
}
if (jsub[[1L]]=="list") {
GForce = TRUE
for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) GForce = FALSE
for (ii in seq.int(from=2L, length.out=length(jsub)-1L)) {
if (!.ok(jsub[[ii]])) {GForce = FALSE; break}
}
} else GForce = .ok(jsub)
if (GForce) {
if (jsub[[1L]]=="list")
Expand All @@ -1665,12 +1667,15 @@ replace_order = function(isub, verbose, env) {
nomeanopt=FALSE # to be set by .optmean() using <<- inside it
oldjsub = jsub
if (jsub[[1L]]=="list") {
for (ii in seq_along(jsub)[-1L]) {
this_jsub = jsub[[ii]]
if (dotN(this_jsub)) next; # For #5760
Copy link
Copy Markdown
Member Author

@MichaelChirico MichaelChirico Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dotN thing isn't doing anything? since this loop only affects is.call elements & dotN specifically checks is.name. So is.call and && will accomplish the same.

Copy link
Copy Markdown
Member

@mattdowle mattdowle Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. And all the time was being spent in dotN too. The slowdown wasn't to do with the .optmean part, per se. Rprof output here: #1470 (comment)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Maybe we should revert to the for loop approach then as well (though timings are pretty small in both cases)?

Copy link
Copy Markdown
Member

@mattdowle mattdowle Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good thought. I tried to revert to the for() loop approach but it was still 15s. Down from 30s but not 0.5s as it should be. So now I'm not not sure what's going on. Let's keep the sapply way then and revisit in the future.
Actually, this is consistent with the Rprof result. If I read it correctly, 50% was in the dotN, not "all" as I wrote above.

# Addressing #1369, #2949 and #1974. Added is.symbol() check to handle cases where expanded function definition is used instead of function names. #1369 results in (function(x) sum(x)) as jsub[[.]] from dcast.data.table.
if (is.call(this_jsub) && is.symbol(this_jsub[[1L]]) && this_jsub[[1L]]=="mean")
jsub[[ii]] = .optmean(this_jsub)
# Addressing #1369, #2949 and #1974. This used to be 30s (vs 0.5s) with 30K elements items in j, #1470. Could have been dotN() and/or the for-looped if()
todo = sapply(jsub, function(x) {
is.call(x) && is.symbol(x[[1L]]) && x[[1L]]=="mean"
# jsub[[1]]=="list" so the first item will always be FALSE
# is.symbol() for when expanded function definition is used instead of function names; #1369 results in (function(x) sum(x)) as jsub[[.]] from dcast.data.table
})
if (any(todo)) {
w = which(todo)
jsub[w] = lapply(jsub[w], .optmean)
}
} else if (jsub[[1L]]=="mean") {
jsub = .optmean(jsub)
Expand Down