-
Notifications
You must be signed in to change notification settings - Fork 1k
setkey on cross-referenced data.tables produces inconsistent results #2162
Copy link
Copy link
Closed
Description
The following code produces inconsistent results:
library(data.table)
d1 <- data.table(x = 1:6, y = 6:1)
d2 <- list(x = d1$x, y = d1$y, x2 = d1$x)
setDT(d2)
str(d1)
str(d2)
str(lapply(d1, address))
str(lapply(d2, address))
setkey(d2, y)
str(d1)
str(d2)The output is as follows:
> library(data.table)
> d1 <- data.table(x = 1:6, y = 6:1)
> d2 <- list(x = d1$x, y = d1$y, x2 = d1$x)
> setDT(d2)
> str(d1)
Classes ‘data.table’ and 'data.frame': 6 obs. of 2 variables:
$ x: int 1 2 3 4 5 6
$ y: int 6 5 4 3 2 1
- attr(*, ".internal.selfref")=<externalptr>
> str(d2)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ x : int 1 2 3 4 5 6
$ y : int 6 5 4 3 2 1
$ x2: int 1 2 3 4 5 6
- attr(*, ".internal.selfref")=<externalptr>
> str(lapply(d1, address))
List of 2
$ x: chr "0x6eaa310"
$ y: chr "0x6eaa358"
> str(lapply(d2, address))
List of 3
$ x : chr "0x6eaa310"
$ y : chr "0x6eaa358"
$ x2: chr "0x6eaa310"
> setkey(d2, y)
> str(d1)
Classes ‘data.table’ and 'data.frame': 6 obs. of 2 variables:
$ x: int 1 2 3 4 5 6
$ y: int 1 2 3 4 5 6
- attr(*, ".internal.selfref")=<externalptr>
> str(d2)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ x : int 1 2 3 4 5 6
$ y : int 1 2 3 4 5 6
$ x2: int 1 2 3 4 5 6
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "sorted")= chr "y"
If d2 <- list(x = d1$x, y = d1$y, x2 = d1$x) does not use the same d1$x twice, that is, only d2 <- list(x = d1$x, y = d1$y), the problem does not occur. Copy everything certainly avoids this problem but it is too expensive sometimes.
My session info:
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.10.4
loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels