Peculiar relative data.table performance drop for large N, which is quite unlike data.table.
N = 1e5:
Unit: milliseconds
expr min lq mean median uq max neval
xts 20.62520 22.93372 25.14445 23.84235 27.25468 39.29402 50
data.table 21.23984 22.29121 27.28266 24.05491 26.25416 98.35812 50
quantmod 14.21228 16.71663 19.54709 17.19368 19.38106 102.56189 50
N = 1e6:
Unit: milliseconds
expr min lq mean median uq max neval
xts 296.8969 380.7494 408.7696 397.4292 431.1306 759.7227 50
data.table 1562.3613 1637.8787 1669.8513 1651.4729 1688.2312 1969.4942 50
quantmod 144.1901 244.2427 278.7676 268.4302 331.4777 418.7951 50
Potential bottleneck in the data.table solution? I have tried various different approaches which all yield slower solutions vis-a-vis the current solution:
- passing
.(year(V1), month(V1)) instead of .(as.yearmon(V1)) to by
- identifying EOM indices in
mat through .I instead of last() (and self-joining)
- applying
cumprod to daily returns (unsurprisingly, even slower).
Current data.table solution:
dtfun = function(dt){
dt[, .(EOM = last(V2)), .(Month = as.yearmon(V1))][, .(Month, Return = EOM/shift(EOM, fill = dt[, first(V2)]) - 1)]
}
Peculiar relative
data.tableperformance drop for large N, which is quite unlikedata.table.N = 1e5:
N = 1e6:
Potential bottleneck in the
data.tablesolution? I have tried various different approaches which all yield slower solutions vis-a-vis the current solution:.(year(V1), month(V1))instead of.(as.yearmon(V1))tobymatthrough.Iinstead oflast()(and self-joining)cumprodto daily returns (unsurprisingly, even slower).Current
data.tablesolution: