Scaling properties for data.table solution

Peculiar relative `data.table` performance drop for large N, which is quite unlike `data.table`.

**N = 1e5:**
```
Unit: milliseconds
       expr      min       lq     mean   median       uq       max neval
        xts 20.62520 22.93372 25.14445 23.84235 27.25468  39.29402    50
 data.table 21.23984 22.29121 27.28266 24.05491 26.25416  98.35812    50
   quantmod 14.21228 16.71663 19.54709 17.19368 19.38106 102.56189    50
```

**N = 1e6:**
```
Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval
        xts  296.8969  380.7494  408.7696  397.4292  431.1306  759.7227    50
 data.table 1562.3613 1637.8787 1669.8513 1651.4729 1688.2312 1969.4942    50
   quantmod  144.1901  244.2427  278.7676  268.4302  331.4777  418.7951    50
```

Potential bottleneck in the `data.table` solution? I have tried various different approaches which all yield slower solutions vis-a-vis the current solution:
- passing `.(year(V1), month(V1))` instead of `.(as.yearmon(V1))` to `by`
- identifying EOM indices in `mat` through `.I` instead of `last()` (and self-joining)
- applying `cumprod` to daily returns (unsurprisingly, even slower).

Current `data.table` solution:
```
dtfun = function(dt){
        dt[, .(EOM = last(V2)), .(Month = as.yearmon(V1))][, .(Month, Return = EOM/shift(EOM, fill = dt[, first(V2)]) - 1)]
      }
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling properties for data.table solution #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scaling properties for data.table solution #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions