[SPARK-26389][SS] Add force delete temp checkpoint configuration#23732
[SPARK-26389][SS] Add force delete temp checkpoint configuration#23732gaborgsomogyi wants to merge 3 commits into
Conversation
|
Test build #102036 has finished for PR 23732 at commit
|
|
retest this please |
|
Test build #102039 has finished for PR 23732 at commit
|
|
retest this please |
|
Test build #102042 has finished for PR 23732 at commit
|
|
thanks for quick improvement |
|
Two minor comments: The flag description is a bit confusing; I would just describe the behavior when the flag is on, rather than saying it’s “normal” deletion. I think the new flag should be read in DataStreamWriter, so the shouldDelete value we pass into StreamExecution will always be correct. |
Clear, will do.
|
|
You're right, there's no good way to move the flag the way I wanted. Looks good modulo Dongjoon's comments. |
|
Test build #102078 has finished for PR 23732 at commit
|
|
Test build #102083 has finished for PR 23732 at commit
|
HeartSaVioR
left a comment
There was a problem hiding this comment.
Looks good overall. Left minor comment on documentation.
|
|
||
| val FORCE_DELETE_TEMP_CHECKPOINT_LOCATION = | ||
| buildConf("spark.sql.streaming.forceDeleteTempCheckpointLocation") | ||
| .doc("When true, enable temporary checkpoint locations force delete.") |
There was a problem hiding this comment.
nit: I'd like to have this to be a valid and more described sentence like other configurations, especially this feature is only documented here. One example for me, When true, Spark always deletes temporary checkpoint locations for success queries.
There was a problem hiding this comment.
Ur, @HeartSaVioR . That is the behavior before this PR.
The benefit of this configuration is to delete the locations when exception occurs.
There was a problem hiding this comment.
Ah yes my bad I meant failed queries, maybe better, regardless of query's result - success or failure. Anyway this option is only documented here so I'd just wanted to make sure there's clear description.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. This configuration is useful.
Merged to master.
|
Thank you, @gaborgsomogyi , @camper42 , @jose-torres , @HeartSaVioR ! |
## What changes were proposed in this pull request? Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it. In this PR I've added a force delete flag which is default `false`. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more. ## How was this patch tested? Existing + additional unit tests. Closes apache#23732 from gaborgsomogyi/SPARK-26389. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it. In this PR I've added a force delete flag which is default `false`. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more. ## How was this patch tested? Existing + additional unit tests. Closes apache#23732 from gaborgsomogyi/SPARK-26389. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
| .createOptional | ||
|
|
||
| val FORCE_DELETE_TEMP_CHECKPOINT_LOCATION = | ||
| buildConf("spark.sql.streaming.forceDeleteTempCheckpointLocation") |
There was a problem hiding this comment.
Let us follow the other boolean conf naming convention?
spark.sql.streaming.forceDeleteTempCheckpointLocation.enabled?
What changes were proposed in this pull request?
Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it.
In this PR I've added a force delete flag which is default
false. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more.How was this patch tested?
Existing + additional unit tests.