From 24fdccd4f53a412b9c5c6cb3ee286f5b9aba8a80 Mon Sep 17 00:00:00 2001 From: Anupam Yadav Date: Sat, 18 Apr 2026 05:30:56 +0000 Subject: [PATCH] [SPARK-56429][DOCS] Clarify differences between nullValue and emptyValue CSV options Update the CSV data source documentation to better explain how nullValue, emptyValue, and nanValue differ from each other and when each option applies. --- docs/sql-data-sources-csv.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-data-sources-csv.md b/docs/sql-data-sources-csv.md index 8008bc562082c..a4e837334b2a4 100644 --- a/docs/sql-data-sources-csv.md +++ b/docs/sql-data-sources-csv.md @@ -141,13 +141,13 @@ Data source options of CSV can be set via: nullValue - Sets the string representation of a null value. Since 2.0.1, this nullValue param applies to all supported types including the string type. + Sets the string that, when encountered in the CSV input, is treated as a SQL NULL. For example, if set to "NA", any field containing NA will be read as null. This also applies when the value is enclosed by the quote character (e.g., "NA" with the default double-quote). Since 2.0.1, this applies to all supported types including the string type. read/write nanValue NaN - Sets the string representation of a non-number value. + Sets the string that, when encountered in the CSV input, is treated as NaN (Not a Number) for float and double type columns. read @@ -237,7 +237,7 @@ Data source options of CSV can be set via: emptyValue (for reading), "" (for writing) - Sets the string representation of an empty value. + Sets the value to substitute when a quoted empty string ("") is encountered in the CSV input. Only applies to string type columns. Unlike nullValue (which matches input to produce null), this specifies the value to produce in the DataFrame. read/write