Conversation
Codecov Report
@@ Coverage Diff @@
## master #3777 +/- ##
==========================================
+ Coverage 99.41% 99.41% +<.01%
==========================================
Files 71 71
Lines 13241 13242 +1
==========================================
+ Hits 13164 13165 +1
Misses 77 77
Continue to review full report at Codecov.
|
| if (jsub[[1L]]=="list") { | ||
| for (ii in seq_along(jsub)[-1L]) { | ||
| this_jsub = jsub[[ii]] | ||
| if (dotN(this_jsub)) next; # For #5760 |
There was a problem hiding this comment.
this dotN thing isn't doing anything? since this loop only affects is.call elements & dotN specifically checks is.name. So is.call and && will accomplish the same.
There was a problem hiding this comment.
+1. And all the time was being spent in dotN too. The slowdown wasn't to do with the .optmean part, per se. Rprof output here: #1470 (comment)
There was a problem hiding this comment.
Good catch! Maybe we should revert to the for loop approach then as well (though timings are pretty small in both cases)?
There was a problem hiding this comment.
Yep good thought. I tried to revert to the for() loop approach but it was still 15s. Down from 30s but not 0.5s as it should be. So now I'm not not sure what's going on. Let's keep the sapply way then and revisit in the future.
Actually, this is consistent with the Rprof result. If I read it correctly, 50% was in the dotN, not "all" as I wrote above.
| cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200L, nlines=1L),"'\n",sep="") | ||
| } | ||
| dotN = function(x) is.name(x) && x == ".N" # For #5760 | ||
| dotN = function(x) is.name(x) && x==".N" # For #5760. TODO: Rprof() showed dotN() may be the culprit if iterated (#1470)?; avoid the == which converts each x to character? |
There was a problem hiding this comment.
x == quote(.N) works, is it any faster?
There was a problem hiding this comment.
I guess not somewhat surprisingly (?):
> microbenchmark::microbenchmark(times = 1e5,
+ quote(N) == quote(N),
+ quote(N) == 'N')
Unit: nanoseconds
expr min lq mean median uq max neval
quote(N) == quote(N) 190 226 309.0476 234 244 3892045 1e+05
quote(N) == "N" 153 182 213.9202 192 198 39180 1e+05
Ditto if we store qN = quote(N) beforehand for the RHS. identical much worse.
R/data.table.R
Outdated
| if (jsub[[1L]]=="list") { | ||
| GForce = TRUE | ||
| for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) GForce = FALSE | ||
| for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) {GForce = FALSE; break} |
There was a problem hiding this comment.
This change is the (good) culprit for the timing strangeness. I've been going back to master and making tweaks there to compare timings to this branch. But this .ok() also calls dotN() hence the confusion. Getting there ...
|
50% of the 30s does seem to be the for (ii in seq_along(jsub)) { # 0.5s (for the rest of [.data.table, not this loop)
# this_jsub = jsub[[ii]]
}
for (ii in seq_along(jsub)) { # 15s (an extra 14.5s for this loop)
this_jsub = jsub[[ii]]
}The Rprof() output for the 15s timing doesn't help much. There isn't even any subassign to the Since the PR is working great by doing away with the for loop, I'll merge and move on. |
Closes #1470
I guess the original intent of using a
forloop was that if nothing injsubhas to be overwritten, no need to copy it. But #1470 is a case where actually nothing is changed, and still copying the whole thing is faster by a factor of 50x