Skip to content

Scaling properties for data.table solution #1

@griipen

Description

@griipen

Peculiar relative data.table performance drop for large N, which is quite unlike data.table.

N = 1e5:

Unit: milliseconds
       expr      min       lq     mean   median       uq       max neval
        xts 20.62520 22.93372 25.14445 23.84235 27.25468  39.29402    50
 data.table 21.23984 22.29121 27.28266 24.05491 26.25416  98.35812    50
   quantmod 14.21228 16.71663 19.54709 17.19368 19.38106 102.56189    50

N = 1e6:

Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval
        xts  296.8969  380.7494  408.7696  397.4292  431.1306  759.7227    50
 data.table 1562.3613 1637.8787 1669.8513 1651.4729 1688.2312 1969.4942    50
   quantmod  144.1901  244.2427  278.7676  268.4302  331.4777  418.7951    50

Potential bottleneck in the data.table solution? I have tried various different approaches which all yield slower solutions vis-a-vis the current solution:

  • passing .(year(V1), month(V1)) instead of .(as.yearmon(V1)) to by
  • identifying EOM indices in mat through .I instead of last() (and self-joining)
  • applying cumprod to daily returns (unsurprisingly, even slower).

Current data.table solution:

dtfun = function(dt){
        dt[, .(EOM = last(V2)), .(Month = as.yearmon(V1))][, .(Month, Return = EOM/shift(EOM, fill = dt[, first(V2)]) - 1)]
      }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions