Skip to content

Commit 91671e4

Browse files
authored
Merge branch 'master' into adding-a-new-test-for-regression-for-4558
2 parents 1e1f4e0 + 680b5e8 commit 91671e4

43 files changed

Lines changed: 2065 additions & 1250 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
^\.devcontainer$
1717
^\.graphics$
1818
^\.github$
19+
^\.zed$
1920

2021
^\.gitlab-ci\.yml$
2122

.ci/atime/tests.R

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,24 @@ test.list <- atime::atime_test_list(
137137
},
138138
Before = "7a9eaf62ede487625200981018d8692be8c6f134", # Parent of the first commit (https://github.com/Rdatatable/data.table/commit/515de90a6068911a148e54343a3503043b8bb87c) in the PR (https://github.com/Rdatatable/data.table/pull/4164/commits) that introduced the regression
139139
Regression = "c152ced0e5799acee1589910c69c1a2c6586b95d", # Parent of the first commit (https://github.com/Rdatatable/data.table/commit/15f0598b9828d3af2eb8ddc9b38e0356f42afe4f) in the PR (https://github.com/Rdatatable/data.table/pull/4558/commits) that fixes the regression
140-
Fixed = "f750448a2efcd258b3aba57136ee6a95ce56b302") # Second commit of the PR (https://github.com/Rdatatable/data.table/pull/4558/commits) that fixes the regression
141-
)
140+
Fixed = "f750448a2efcd258b3aba57136ee6a95ce56b302"), # Second commit of the PR (https://github.com/Rdatatable/data.table/pull/4558/commits) that fixes the regression
141+
142+
# Issue with sorting again when already sorted: https://github.com/Rdatatable/data.table/issues/4498
143+
# Fixed in: https://github.com/Rdatatable/data.table/pull/4501
144+
"DT[,.SD] improved in #4501" = atime::atime_test(
145+
N = 10^seq(1, 10, by=0.5),
146+
setup = {
147+
set.seed(1)
148+
L = as.data.table(as.character(rnorm(N, 1, 0.5)))
149+
setkey(L, V1)
150+
},
151+
## New DT can safely retain key.
152+
expr = {
153+
data.table:::`[.data.table`(L, , .SD)
154+
},
155+
Fast = "353dc7a6b66563b61e44b2fa0d7b73a0f97ca461", # Close-to-last merge commit in the PR (https://github.com/Rdatatable/data.table/pull/4501/commits) that fixes the issue
156+
Slow = "3ca83738d70d5597d9e168077f3768e32569c790", # Circa 2024 master parent of close-to-last merge commit (https://github.com/Rdatatable/data.table/commit/353dc7a6b66563b61e44b2fa0d7b73a0f97ca461) in the PR (https://github.com/Rdatatable/data.table/pull/4501/commits) that fixes the issue
157+
Slower = "cacdc92df71b777369a217b6c902c687cf35a70d"), # Circa 2020 parent of the first commit (https://github.com/Rdatatable/data.table/commit/74636333d7da965a11dad04c322c752a409db098) in the PR (https://github.com/Rdatatable/data.table/pull/4501/commits) that fixes the issue
158+
159+
NULL)
142160
# nolint end: undesirable_operator_linter.

.dev/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ source(".dev/cc.R")
1717
Developer helper script providing `cc` function. If one starts R session in `data.table` project root directory `.dev/cc.R` file should be automatically sourced (due to local `.Rprofile` file) making `cc()` (and `dd()`) function available straightaway.
1818

1919
```r
20-
cc(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, cc_dir, path=Sys.getenv("PROJ_PATH"), CC="gcc")
20+
cc(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, path=Sys.getenv("PROJ_PATH", unset=normalizePath(".")), CC="gcc", quiet=FALSE)
2121
```
2222

2323
Use `cc()` to re-compile all C sources and attach all `data.table` R functions (including non-exported ones).

.github/workflows/R-CMD-check-occasional.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
on:
22
schedule:
3-
- cron: '17 13 17 * *' # 17th of month at 13:17 UTC
3+
- cron: '17 13 18 * *' # 18th of month at 13:17 UTC
44

55
# A more complete suite of checks to run monthly; each PR/merge need not pass all these, but they should pass before CRAN release
66
name: R-CMD-check-occasional
@@ -83,11 +83,15 @@ jobs:
8383
run: brew install gdal proj
8484

8585
- name: Install remotes
86+
env:
87+
R_LIBS_USER: /home/runner/work/r-lib
8688
run: install.packages("remotes")
8789
shell: Rscript {0}
8890

8991
- name: Install system dependencies
9092
if: runner.os == 'Linux'
93+
env:
94+
R_LIBS_USER: /home/runner/work/r-lib
9195
run: |
9296
while read -r cmd
9397
do
@@ -103,6 +107,7 @@ jobs:
103107
R_LIBS_USER: /home/runner/work/r-lib
104108
run: |
105109
options(crayon.enabled = TRUE)
110+
install.packages("remotes") # different R_LIBS_USER now...
106111
remotes::install_deps(dependencies=TRUE, force=TRUE)
107112
108113
# we define this in data.table namespace, but it appears to be exec

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ config.log
2323
.Rproj.user
2424
data.table.Rproj
2525

26+
# zed editor
27+
.zed
28+
2629
# produced vignettes
2730
vignettes/*.html
2831
vignettes/*.pdf

.gitlab-ci.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,8 @@ test-lin-rel:
117117
OPENBLAS_MAIN_FREE: "1"
118118
script:
119119
- *install-deps
120-
- echo 'CFLAGS=-g -O3 -flto=auto -fno-common -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
121-
- echo 'CXXFLAGS=-g -O3 -flto=auto -fno-common -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
120+
- echo 'CFLAGS=-g -O3 -flto=auto -fno-common -fopenmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
121+
- echo 'CXXFLAGS=-g -O3 -flto=auto -fno-common -fopenmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
122122
- R CMD check $(ls -1t data.table_*.tar.gz | head -n 1)
123123
- (! grep "warning:" data.table.Rcheck/00install.out)
124124

@@ -132,8 +132,8 @@ test-lin-rel-vanilla:
132132
<<: *test-lin
133133
image: registry.gitlab.com/jangorecki/dockerfiles/r-base-gcc
134134
script:
135-
- echo 'CFLAGS=-g -O0 -fno-openmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
136-
- echo 'CXXFLAGS=-g -O0 -fno-openmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
135+
- echo 'CFLAGS=-g -O0 -fno-openmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
136+
- echo 'CXXFLAGS=-g -O0 -fno-openmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
137137
- R CMD check --no-manual --ignore-vignettes $(ls -1t data.table_*.tar.gz | head -n 1)
138138

139139
## R-release on Linux
@@ -149,8 +149,8 @@ test-lin-rel-cran:
149149
_R_CHECK_PKG_SIZES_THRESHOLD_: "7" ## MB 'checking installed package size' NOTE
150150
script:
151151
- *install-deps
152-
- echo 'CFLAGS=-g -O2 -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
153-
- echo 'CXXFLAGS=-g -O2 -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
152+
- echo 'CFLAGS=-g -O2 -fopenmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
153+
- echo 'CXXFLAGS=-g -O2 -fopenmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
154154
- R CMD check --as-cran $(ls -1t data.table_*.tar.gz | head -n 1)
155155
- >-
156156
Rscript -e 'l=tail(readLines("data.table.Rcheck/00check.log"), 1L); if (!identical(l, "Status: OK")) stop("Last line of ", shQuote("00check.log"), " is not ", shQuote("Status: OK"), " but ", shQuote(l)) else q("no")'
@@ -168,8 +168,8 @@ test-lin-dev-gcc-strict-cran:
168168
_R_S3_METHOD_LOOKUP_BASEENV_AFTER_GLOBALENV_: "FALSE" ## detects S3 method lookup found on search path #4777
169169
_R_S3_METHOD_LOOKUP_REPORT_SEARCH_PATH_USES_: "TRUE"
170170
script:
171-
- echo 'CFLAGS=-g -O2 -flto=auto -fno-common -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
172-
- echo 'CXXFLAGS=-g -O2 -flto=auto -fno-common -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
171+
- echo 'CFLAGS=-g -O2 -flto=auto -fno-common -fopenmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
172+
- echo 'CXXFLAGS=-g -O2 -flto=auto -fno-common -fopenmp -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
173173
- *install-deps
174174
- R CMD check --as-cran $(ls -1t data.table_*.tar.gz | head -n 1)
175175
- (! grep "warning:" data.table.Rcheck/00install.out)
@@ -189,8 +189,8 @@ test-lin-dev-clang-cran:
189189
_R_S3_METHOD_LOOKUP_BASEENV_AFTER_GLOBALENV_: "FALSE"
190190
_R_S3_METHOD_LOOKUP_REPORT_SEARCH_PATH_USES_: "TRUE"
191191
script:
192-
- echo 'CFLAGS=-g -O2 -fno-common -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
193-
- echo 'CXXFLAGS=-g -O2 -fno-common -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
192+
- echo 'CFLAGS=-g -O2 -fno-common -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
193+
- echo 'CXXFLAGS=-g -O2 -fno-common -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
194194
- *install-deps
195195
- R CMD check --as-cran $(ls -1t data.table_*.tar.gz | head -n 1)
196196
- (! grep "warning:" data.table.Rcheck/00install.out)

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Authors@R: c(
1818
person("Jan","Gorecki", role="aut"),
1919
person("Michael","Chirico", role="aut", comment = c(ORCID="0000-0003-0787-087X")),
2020
person("Toby","Hocking", role="aut", comment = c(ORCID="0000-0002-3146-0865")),
21-
person("Benjamin","Schwendinger",role="aut"),
21+
person("Benjamin","Schwendinger",role="aut", comment = c(ORCID="0000-0003-3315-8114")),
2222
person("Pasha","Stetsenko", role="ctb"),
2323
person("Tom","Short", role="ctb"),
2424
person("Steve","Lianoglou", role="ctb"),

NAMESPACE

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,6 @@ export(substitute2)
6262
#export(DT) # mtcars |> DT(i,j,by) #4872 #5472
6363

6464
S3method("[", data.table)
65-
export("[.data.table") # so that functional DT() finds it; PR#5176
6665
S3method("[<-", data.table)
6766
# S3method("[[", data.table)
6867
# S3method("[[<-", data.table)

NEWS.md

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
# data.table [v1.15.99](https://github.com/Rdatatable/data.table/milestone/30) (in development)
44

5+
## BREAKING CHANGE
6+
7+
1. `` `[.data.table` `` is un-exported again. This was exported to support an experimental feature (`DT()` functional form of `[`) that never made it to release, but we forgot to claw back this export in the NAMESPACE; sorry about that. We didn't find anyone calling the method directly (which is inadvisable to begin with).
8+
59
## NEW FEATURES
610

711
1. `print.data.table()` shows empty (`NULL`) list column entries as `[NULL]` for emphasis. Previously they would just print nothing (same as for empty string). Part of [#4198](https://github.com/Rdatatable/data.table/issues/4198). Thanks @sritchie73 for the proposal and fix.
@@ -40,6 +44,12 @@
4044

4145
14. `fread` loads `.bgz` files directly, [#5461](https://github.com/Rdatatable/data.table/issues/5461). Thanks to @TMRHarrison for the request with proposed fix, and Benjamin Schwendinger for the PR.
4246

47+
15. `rbindlist(l, use.names=TRUE)` and `rbind` now works correctly on columns with different class attributes for certain classes such as `Date`, `IDate`, `ITime`, `POSIXct` and `AsIs` with other columns of similar classes, e.g., `IDate` and `Date`. The conversion is done automatically and the class attribute of the final column is determined by the first encountered class attribute in the binding list, [#5309](https://github.com/Rdatatable/data.table/issues/5309), [#4934](https://github.com/Rdatatable/data.table/issues/4934), [#5391](https://github.com/Rdatatable/data.table/issues/5391).
48+
49+
`rbindlist(l, ignore.attr=TRUE)` and `rbind` also gains argument `ignore.attr` to manually deactivate the safety-net of binding columns with different column classes, [#3911](https://github.com/Rdatatable/data.table/issues/3911), [#5542](https://github.com/Rdatatable/data.table/issues/5542). Thanks to @dcaseykc, @fox34, @adrian-quintario, @berg-michael, @arunsrinivasan, @statquant, @pkress, @jrausch12, @therosko, @OfekShilon, @iMissile, @tdhock for the request and @ben-schwen for the PR.
50+
51+
16. `fcase()` supports scalars in conditions (e.g. supplying just `TRUE`), vectors in `default=` (so the default can vary by row), and `default=` is now lazily evaluated, [#5461](https://github.com/Rdatatable/data.table/issues/5461). Thanks @sindribaldur for the feature request, which has been highly requested, @shrektan for doing most of the implementation, and @MichaelChirico for sewing things up.
52+
4353
## BUG FIXES
4454

4555
1. `unique()` returns a copy the case when `nrows(x) <= 1` instead of a mutable alias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.
@@ -66,10 +76,14 @@
6676

6777
12. data.table's `all.equal()` method now dispatches to each column's own `all.equal()` method as appropriate, [#4543](https://github.com/Rdatatable/data.table/issues/4543). Thanks @MichaelChirico for the report and fix. Note that this had two noteworthy changes to data.table's own test suite that might affect you: (1) comparisons of POSIXct columns compare absolute, not relative differences, meaning that millisecond-scale differences might trigger a "not equal" report that was hidden before; and (2) comparisons of integer64 columns could be totally wrong since they were being compared on the basis of their representation as doubles, not long integers. The former might be a matter of preference requiring you to specify a different `tolerance=`, while the latter was clearly a bug.
6878

69-
13. `rbindlist` could lead to a protection stack overflow when applied to a list containing many nested lists exceeding the pointer protection stack size, [#4536](https://github.com/Rdatatable/data.table/issues/4536). Thanks to @ProfFancyPants for reporting, and Benjamin Schwendinger for the fix.
79+
13. `rbindlist` and `shift` could lead to a protection stack overflow when applied to a list containing many nested lists exceeding the pointer protection stack size, [#4536](https://github.com/Rdatatable/data.table/issues/4536). Thanks to @ProfFancyPants for reporting, and Benjamin Schwendinger (`rbindlist`) and @MichaelChirico (`shift`) for the fix.
7080

7181
14. `fread(x, colClasses="POSIXct")` now also works for columns containing only NA values, [#6208](https://github.com/Rdatatable/data.table/issues/6208). Thanks to @markus-schaffer for the report, and Benjamin Schwendinger for the fix.
7282

83+
15. `fread()` is more careful about detecting that a file is compressed in bzip2 format, [#6304](https://github.com/Rdatatable/data.table/issues/6304). In particular, we also check the 4th byte is a digit; in rare cases, a legitimate uncompressed CSV file could match 'BZh' as the first 3 bytes. We think an uncompressed CSV file matching 'BZh[1-9]' is all the more rare and unlikely to be encountered in "real" examples. Other formats (zip, gzip) are friendly enough to use non-printable characters in their magic numbers. Thanks @grainnemcguire for the report and @MichaelChirico for the fix.
84+
85+
16. Selecting keyed list columns will retain key without a performance penalty, closes [#4498](https://github.com/Rdatatable/data.table/issues/4498). Thanks to @user9439449 on StackOverflow for the report.
86+
7387
## NOTES
7488

7589
1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
@@ -116,16 +130,58 @@
116130
117131
20. Removed a warning about the now totally-obsolete option `datatable.CJ.names`, as discussed in previous releases.
118132
119-
21. Refactored some non-API calls to R macros for S4 objects (#6180)[https://github.com/Rdatatable/data.table/issues/6180]. There should be no user-visible change. Thanks to various R users & R core for pushing to have a clearer definition of "API" for R, and thanks @MichaelChirico for implementing here.
133+
21. Refactored some non-API calls in the package C code, (#6180)[https://github.com/Rdatatable/data.table/issues/6180]. There should be no user-visible change. Thanks to various R users, R core, and especially Luke Tierney for pushing to have a clearer definition of "API" for R and for offering clear documentation and suggested workarounds. Thanks @MichaelChirico and @TysonStanley for implementing changes for this release; more will follow.
120134
121135
22. C code was unified more in how failures to allocate memory (`malloc()`/`calloc()`) are handled, (#1115)[https://github.com/Rdatatable/data.table/issues/1115]. No OOM issues were reported, as these regions of code typically request relatively small blocks of memory, but it is good to handle memory pressure consistently. Thanks @elfring for the report and @MichaelChirico for the clean-up effort and future-proofing linter.
122136
137+
22. Internal routine for finding sort order will now re-use any existing index. A similar optimization was already present in R code, but this has now been pushed to C and covers a wider range of use cases and collects more statistics about its input (e.g. whether any infinite entries were found), opening the possibility for more optimizations in other functions.
138+
139+
Functions `setindex` (and `setindexv`) will now compute groups' positions as well. `setindex()` also collects the extra statistics alluded to above.
140+
141+
Finding sort order in other routines (for example subset `d2[id==1L]`) does not include those extra statistics so as not to impose a slowdown.
142+
143+
```r
144+
d2 = data.table(id=2:1, v2=1:2)
145+
setindexv(d2, "id")
146+
str(attr(attr(d2, "index"), "__id"))
147+
# int [1:2] 2 1
148+
# - attr(*, "starts")= int [1:2] 1 2
149+
# - attr(*, "maxgrpn")= int 1
150+
# - attr(*, "anyna")= int 0
151+
# - attr(*, "anyinfnan")= int 0
152+
# - attr(*, "anynotascii")= int 0
153+
# - attr(*, "anynotutf8")= int 0
154+
155+
d2 = data.table(id=2:1, v2=1:2)
156+
invisible(d2[id==1L])
157+
str(attr(attr(d2, "index"), "__id"))
158+
# int [1:2] 2 1
159+
```
160+
161+
This feature also enables re-use of sort index during joins, in cases where one of the calls to find sort order is made from C code.
162+
163+
```r
164+
d1 = data.table(id=1:2, v1=1:2)
165+
d2 = data.table(id=2:1, v2=1:2)
166+
setindexv(d2, "id")
167+
d1[d2, on="id", verbose=TRUE]
168+
#...
169+
#Starting bmerge ...
170+
#forderReuseSorting: using existing index: __id
171+
#forderReuseSorting: opt=2, took 0.000s
172+
#...
173+
```
174+
175+
This feature resolves [#4387](https://github.com/Rdatatable/data.table/issues/4387), [#2947](https://github.com/Rdatatable/data.table/issues/2947), [#4380](https://github.com/Rdatatable/data.table/issues/4380), and [#1321](https://github.com/Rdatatable/data.table/issues/1321). Thanks to @jangorecki, @jan-glx, and @MichaelChirico for the reports and @jangorecki for implementing.
176+
123177
## TRANSLATIONS
124178

125179
1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.
126180

127181
2. data.table is now translated into Brazilian Portuguese (`pt_BR`) as well as Mandarin (`zh_CN`). Thanks to the [new translation team](https://github.com/orgs/Rdatatable/teams/brazil) consisting initially of @rffontenelle, @leofontenelle, and @italo-07. The team is open if you'd also like to join and support maintenance of these translations.
128182

183+
3. A more helpful error message for using `:=` inside the first argument (`i`) of `[.data.table` is now available in translation, [#6293](https://github.com/Rdatatable/data.table/issues/6293). Previously, the code to display this assumed an earlier message was printed in English. The solution is for calling `:=` directly (i.e., outside the second argument `j` of `[.data.table`) to throw an error of class `dt_invalid_let_error`. Thanks to Spanish translator @rikivillalba for spotting the issue and @MichaelChirico for the fix.
184+
129185
# data.table [v1.15.4](https://github.com/Rdatatable/data.table/milestone/33) (27 March 2024)
130186

131187
## BUG FIXES

0 commit comments

Comments
 (0)