Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@
* Now handles floating-point NaN values in a wide variety of formats, including `NaN`, `sNaN`, `1.#QNAN`, `NaN1234`, `#NUM!` and others, [#1800](https://github.com/Rdatatable/data.table/issues/1800). Thanks to Jori Liesenborgs for highlighting and the PR.
* Many thanks to @yaakovfeldman, Guillermo Ponce, Arun Srinivasan, Hugh Parsonage, Mark Klik, Pasha Stetsenko, Mahyar K, Tom Crockett, @cnoelke, @qinjs, @etienne-s, Mark Danese, Avraham Adler for testing before release to CRAN: [#2070](https://github.com/Rdatatable/data.table/issues/2070), [#2073](https://github.com/Rdatatable/data.table/issues/2073), [#2087](https://github.com/Rdatatable/data.table/issues/2087), [#2091](https://github.com/Rdatatable/data.table/issues/2091), [#2107](https://github.com/Rdatatable/data.table/issues/2107), [fst#50](https://github.com/fstpackage/fst/issues/50#issuecomment-294287846), [#2118](https://github.com/Rdatatable/data.table/issues/2118), [#2092](https://github.com/Rdatatable/data.table/issues/2092), [#1888](https://github.com/Rdatatable/data.table/issues/1888), [#2123](https://github.com/Rdatatable/data.table/issues/2123), [#2167](https://github.com/Rdatatable/data.table/issues/2167), [#2194](https://github.com/Rdatatable/data.table/issues/2194), [#2238](https://github.com/Rdatatable/data.table/issues/2238), [#2228](https://github.com/Rdatatable/data.table/issues/2228), [#1464](https://github.com/Rdatatable/data.table/issues/1464), [#2201](https://github.com/Rdatatable/data.table/issues/2201), [#2287](https://github.com/Rdatatable/data.table/issues/2287), [#2299](https://github.com/Rdatatable/data.table/issues/2299), [#2285](https://github.com/Rdatatable/data.table/issues/2285), [#2251](https://github.com/Rdatatable/data.table/issues/2251), [#2347](https://github.com/Rdatatable/data.table/issues/2347), [#2222](https://github.com/Rdatatable/data.table/issues/2222), [#2352](https://github.com/Rdatatable/data.table/issues/2352), [#2246](https://github.com/Rdatatable/data.table/issues/2246)

2. `fwrite` now always quotes empty strings (`,"",`) to distinguish them from `NA` which by default is still empty (`,,`) but can be changed using `na=` as before. If `na=` is provided and `quote=` is the default `'auto'` then `quote=` is set to `TRUE` so that if the `na=` value occurs in the data, it can be distinguished from `NA`. Thanks to Ethan Welty for the request [#2214](https://github.com/Rdatatable/data.table/issues/2214) and Pasha for the code change and tests, [#2215](https://github.com/Rdatatable/data.table/issues/2215).
2. `fwrite()`:
* empty strings are now always quoted (`,"",`) to distinguish them from `NA` which by default is still empty (`,,`) but can be changed using `na=` as before. If `na=` is provided and `quote=` is the default `'auto'` then `quote=` is set to `TRUE` so that if the `na=` value occurs in the data, it can be distinguished from `NA`. Thanks to Ethan Welty for the request [#2214](https://github.com/Rdatatable/data.table/issues/2214) and Pasha for the code change and tests, [#2215](https://github.com/Rdatatable/data.table/issues/2215).
* `logicalAsInt` has been renamed `logical01` and the default changed from `FALSE` to `TRUE`, both changes for consistency with `fread` (see item above). The old name `logicalAsInt` continues to work but is now deprecated. The previous default can easily be restored without any code changes by setting `options("datatable.logical01" = FALSE)`.

3. Added helpful message when subsetting by a logical column without wrapping it in parentheses, [#1844](https://github.com/Rdatatable/data.table/issues/1844). Thanks @dracodoc for the suggestion and @MichaelChirico for the PR.

Expand Down
31 changes: 19 additions & 12 deletions R/fwrite.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,26 @@ fwrite <- function(x, file="", append=FALSE, quote="auto",
sep=",", sep2=c("","|",""), eol=if (.Platform$OS.type=="windows") "\r\n" else "\n",
na="", dec=".", row.names=FALSE, col.names=TRUE,
qmethod=c("double","escape"),
logicalAsInt=FALSE, dateTimeAs = c("ISO","squash","epoch","write.csv"),
logical01=getOption("datatable.logical01", TRUE),
logicalAsInt=logical01,
dateTimeAs = c("ISO","squash","epoch","write.csv"),
buffMB=8, nThread=getDTthreads(),
showProgress=interactive(),
verbose=getOption("datatable.verbose")) {
verbose=getOption("datatable.verbose", FALSE)) {
isLOGICAL = function(x) isTRUE(x) || identical(FALSE, x) # it seems there is no isFALSE in R?
na = as.character(na[1L]) # fix for #1725
if (missing(qmethod)) qmethod = qmethod[1L]
if (missing(dateTimeAs)) dateTimeAs = dateTimeAs[1L]
else if (length(dateTimeAs)>1) stop("dateTimeAs must be a single string")
dateTimeAs = chmatch(dateTimeAs, c("ISO","squash","epoch","write.csv"))-1L
if (is.na(dateTimeAs)) stop("dateTimeAs must be 'ISO','squash','epoch' or 'write.csv'")
if (!missing(logical01) && !missing(logicalAsInt))
stop("logicalAsInt has been renamed logical01. Use logical01 only, not both.")
if (!missing(logicalAsInt)) {
# TODO: warning("logicalAsInt has been renamed logical01 for consistency with fread. It will work fine but please change to logical01 at your convenience so we can remove logicalAsInt in future.")
logical01 = logicalAsInt
logicalAsInt=NULL
}
buffMB = as.integer(buffMB)
nThread = as.integer(nThread)
# write.csv default is 'double' so fwrite follows suit. write.table's default is 'escape'
Expand All @@ -26,7 +35,7 @@ fwrite <- function(x, file="", append=FALSE, quote="auto",
is.character(eol) && length(eol)==1L,
length(qmethod) == 1L && qmethod %in% c("double", "escape"),
isLOGICAL(col.names), isLOGICAL(append), isLOGICAL(row.names),
isLOGICAL(verbose), isLOGICAL(showProgress), isLOGICAL(logicalAsInt),
isLOGICAL(verbose), isLOGICAL(showProgress), isLOGICAL(logical01),
length(na) == 1L, #1725, handles NULL or character(0) input
is.character(file) && length(file)==1 && !is.na(file),
length(buffMB)==1 && !is.na(buffMB) && 1<=buffMB && buffMB<=1024,
Expand All @@ -37,16 +46,14 @@ fwrite <- function(x, file="", append=FALSE, quote="auto",
col.names = FALSE # test 1658.16 checks this
if (identical(quote,"auto")) quote=NA # logical NA
if (file=="") {
# console output (Rprintf) isn't thread safe.
# Perhaps more so on Windows (as experienced) than Linux
nThread=1L
showProgress=FALSE
# console output which it seems isn't thread safe on Windows even when one-batch-at-a-time
nThread = 1L
showProgress = FALSE
eol = "\n" # Rprintf() is used at C level which knows inside it to output \r\n on Windows. Otherwise extra \r is output.
}
.Call(Cwritefile, x, file, sep, sep2, eol, na, dec, quote, qmethod=="escape", append,
row.names, col.names, logicalAsInt, dateTimeAs, buffMB, nThread,
showProgress, verbose)
.Call(CfwriteR, x, file, sep, sep2, eol, na, dec, quote, qmethod=="escape", append,
row.names, col.names, logical01, dateTimeAs, buffMB, nThread,
showProgress, verbose)
invisible()
}

genLookups = function() invisible(.Call(CgenLookups))

3 changes: 2 additions & 1 deletion R/onLoad.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@
"datatable.use.index"="TRUE", # global switch to address #1422
"datatable.fread.datatable"="TRUE",
"datatable.prettyprint.char" = NULL, # FR #1091
"datatable.old.unique.by.key" = "FALSE" # TODO: warn 1 year, remove after 2 years
"datatable.old.unique.by.key" = "FALSE", # TODO: warn 1 year, remove after 2 years
"datatable.logical01" = "TRUE" # fwrite/fread to revert to FALSE. TODO: warn in next release and remove after 1 year
)
for (i in setdiff(names(opts),names(options()))) {
eval(parse(text=paste("options(",i,"=",opts[i],")",sep="")))
Expand Down
12 changes: 9 additions & 3 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -8831,7 +8831,7 @@ test(1658.16, fwrite(data.table(
factor1=as.factor(c('foo', 'bar')),
factor2=as.factor(c(NA, "baz")),
bool=c(TRUE,NA),
ints=as.integer(c(NA, 5))), na='na', quote=TRUE),
ints=as.integer(c(NA, 5))), na='na', quote=TRUE, logical01=FALSE),
output='"factor1","factor2","bool","ints"\n"foo",na,TRUE,na\n"bar","baz",na,5\n')

# empty data table (headers but no rows)
Expand All @@ -8855,6 +8855,11 @@ unlink(f)
ok_dt <- data.table(foo="bar")
test(1658.22, fwrite(ok_dt, quote=TRUE), output='"foo"\n"bar"\n')

# integer NA
DT = data.table(A=c(2L,NA,3L), B=c(NA,4:5))
test(1658.23, fwrite(DT), output='A,B2,,43,5')
test(1658.24, fwrite(DT, na="NA", verbose=TRUE), output='Writing column names.*"A","B".*2,NANA,43,5')

options(oldverbose)

# wrong argument types
Expand Down Expand Up @@ -9699,13 +9704,14 @@ set.seed(1)
DT = data.table(A=1:4,
B=list(1:10,15:18,7,9:10),
C=list(letters[19:23],c(1.2,2.3,3.4,pi,-9),c("foo","bar"),c(TRUE,TRUE,FALSE)))
test(1736.1, capture.output(fwrite(DT)), c("A,B,C", "1,1|2|3|4|5|6|7|8|9|10,s|t|u|v|w",
test(1736.1, capture.output(fwrite(DT,logical01=FALSE)), c("A,B,C", "1,1|2|3|4|5|6|7|8|9|10,s|t|u|v|w",
"2,15|16|17|18,1.2|2.3|3.4|3.14159265358979|-9", "3,7,foo|bar", "4,9|10,TRUE|TRUE|FALSE"))
test(1736.2, fwrite(DT, sep2=","), error="length(sep2)")
test(1736.3, fwrite(DT, sep2=c("",",","")), error="sep.*,.*sep2.*,.*must all be different")
test(1736.4, fwrite(DT, sep2=c("","||","")), error="nchar.*sep2.*2")
test(1736.5, capture.output(fwrite(DT, sep='|', sep2=c("c(",",",")"))), c("A|B|C", "1|c(1,2,3,4,5,6,7,8,9,10)|c(s,t,u,v,w)",
test(1736.5, capture.output(fwrite(DT, sep='|', sep2=c("c(",",",")"), logical01=FALSE)), c("A|B|C", "1|c(1,2,3,4,5,6,7,8,9,10)|c(s,t,u,v,w)",
"2|c(15,16,17,18)|c(1.2,2.3,3.4,3.14159265358979,-9)", "3|c(7)|c(foo,bar)", "4|c(9,10)|c(TRUE,TRUE,FALSE)"))
# Aside: logicalAsInt tested in 1736.6 to continue to work without warning, currently. TODO: warning, deprecate and remove
test(1736.6, capture.output(fwrite(DT, sep='|', sep2=c("{",",","}"), logicalAsInt=TRUE)),
c("A|B|C", "1|{1,2,3,4,5,6,7,8,9,10}|{s,t,u,v,w}",
"2|{15,16,17,18}|{1.2,2.3,3.4,3.14159265358979,-9}", "3|{7}|{foo,bar}", "4|{9,10}|{1,1,0}"))
Expand Down
9 changes: 6 additions & 3 deletions man/fwrite.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ fwrite(x, file = "", append = FALSE, quote = "auto",
eol = if (.Platform$OS.type=="windows") "\r\n" else "\n",
na = "", dec = ".", row.names = FALSE, col.names = TRUE,
qmethod = c("double","escape"),
logicalAsInt = FALSE, dateTimeAs = c("ISO","squash","epoch","write.csv"),
logical01 = getOption("datatable.logical01", TRUE),
logicalAsInt = logical01, # deprecated
dateTimeAs = c("ISO","squash","epoch","write.csv"),
buffMB = 8L, nThread = getDTthreads(),
showProgress = interactive(),
verbose = getOption("datatable.verbose"))
verbose = getOption("datatable.verbose", FALSE))
}
\arguments{
\item{x}{Any \code{list} of same length vectors; e.g. \code{data.frame} and \code{data.table}.}
Expand All @@ -34,7 +36,8 @@ fwrite(x, file = "", append = FALSE, quote = "auto",
\item{"escape" - the quote character (as well as the backslash character) is escaped in C style by a backslash, or}
\item{"double" (default, same as \code{write.csv}), in which case the double quote is doubled with another one.}
}}
\item{logicalAsInt}{Should \code{logical} values be written as \code{1} and \code{0} rather than \code{"TRUE"} and \code{"FALSE"}?}
\item{logical01}{Should \code{logical} values be written as \code{1} and \code{0} rather than \code{"TRUE"} and \code{"FALSE"}?}
\item{logicalAsInt}{Deprecated. Old name for `logical01`. Name change for consistency with `fread` for which `logicalAsInt` would not make sense.}
\item{dateTimeAs}{ How \code{Date}/\code{IDate}, \code{ITime} and \code{POSIXct} items are written.
\itemize{
\item{"ISO" (default) - \code{2016-09-12}, \code{18:12:16} and \code{2016-09-12T18:12:16.999999Z}. 0, 3 or 6 digits of fractional seconds are printed if and when present for convenience, regardless of any R options such as \code{digits.secs}. The idea being that if milli and microseconds are present then you most likely want to retain them. R's internal UTC representation is written faithfully to encourage ISO standards, stymie timezone ambiguity and for speed. An option to consider is to start R in the UTC timezone simply with \code{"$ TZ='UTC' R"} at the shell (NB: it must be one or more spaces between \code{TZ='UTC'} and \code{R}, anything else will be silently ignored; this TZ setting applies just to that R process) or \code{Sys.setenv(TZ='UTC')} at the R prompt and then continue as if UTC were local time.}
Expand Down
Loading