Skip to content

[SPARK-54173][K8S] Add support for Deployment API on K8s#52867

Closed
ForVic wants to merge 10 commits into
apache:masterfrom
ForVic:dev/victors/deployment_allocator
Closed

[SPARK-54173][K8S] Add support for Deployment API on K8s#52867
ForVic wants to merge 10 commits into
apache:masterfrom
ForVic:dev/victors/deployment_allocator

Conversation

@ForVic
Copy link
Copy Markdown
Contributor

@ForVic ForVic commented Nov 4, 2025

What changes were proposed in this pull request?

Adds support for K8s Deployment API to allocate pods.

Why are the changes needed?

Allocating individual pods is not ideal, and we can allocate with higher level APIs. #33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports pod-deletion-cost annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.

Does this PR introduce any user-facing change?

Yes, adds user-facing configs

spark.kubernetes.executor.podDeletionCost

How was this patch tested?

New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled

Was this patch authored or co-authored using generative AI tooling?

No

@ForVic ForVic changed the title [todo] Add support for Deployment API on K8s [todo][K8S] Add support for Deployment API on K8s Nov 4, 2025
@ForVic ForVic changed the title [todo][K8S] Add support for Deployment API on K8s [SPARK-54173][K8S] Add support for Deployment API on K8s Nov 4, 2025
@ForVic ForVic marked this pull request as ready for review November 5, 2025 01:03
@ForVic
Copy link
Copy Markdown
Contributor Author

ForVic commented Nov 5, 2025

cc @sunchao

also @holdenk @dongjoon-hyun

Comment thread docs/running-on-kubernetes.md Outdated
@sunchao
Copy link
Copy Markdown
Member

sunchao commented Nov 5, 2025

Thanks @dongjoon-hyun for the review!

Copy link
Copy Markdown
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ForVic ! Looks mostly good to me (already reviewed internally). Just one question around the new conf.

Copy link
Copy Markdown
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunchao sunchao closed this in d4bc277 Nov 11, 2025
@sunchao
Copy link
Copy Markdown
Member

sunchao commented Nov 11, 2025

Merged to master, thanks @ForVic for the contribution and @dongjoon-hyun for the review!

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
### What changes were proposed in this pull request?

Adds support for K8s `Deployment` API to allocate pods.

### Why are the changes needed?

Allocating individual pods is not ideal, and we can allocate with higher level APIs. apache#33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.

### Does this PR introduce _any_ user-facing change?
Yes, adds user-facing configs
```
spark.kubernetes.executor.podDeletionCost
```

### How was this patch tested?
New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#52867 from ForVic/dev/victors/deployment_allocator.

Lead-authored-by: Victor Sunderland <victors@openai.com>
Co-authored-by: victors-oai <victors@openai.com>
Co-authored-by: Victor Sunderland <64456855+ForVic@users.noreply.github.com>
Signed-off-by: Chao Sun <chao@openai.com>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?

Adds support for K8s `Deployment` API to allocate pods.

### Why are the changes needed?

Allocating individual pods is not ideal, and we can allocate with higher level APIs. apache#33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.

### Does this PR introduce _any_ user-facing change?
Yes, adds user-facing configs
```
spark.kubernetes.executor.podDeletionCost
```

### How was this patch tested?
New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#52867 from ForVic/dev/victors/deployment_allocator.

Lead-authored-by: Victor Sunderland <victors@openai.com>
Co-authored-by: victors-oai <victors@openai.com>
Co-authored-by: Victor Sunderland <64456855+ForVic@users.noreply.github.com>
Signed-off-by: Chao Sun <chao@openai.com>
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, K8s v1.35 was released a few days ago (2025-12-17) and I added a K8s v1.35 test coverage for Apache Spark 4.2.0. Thank you again for contributing SPARK-54173 to the Apache Spark repository, @ForVic and @sunchao .

.createWithDefault("direct")

val KUBERNETES_EXECUTOR_POD_DELETION_COST =
ConfigBuilder("spark.kubernetes.executor.podDeletionCost")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a future plan to manage this dynamically from Apache Spark side, @ForVic ?

If this is static, can we reuse spark.kubernetes.executor.annotation.controller.kubernetes.io/pod-deletion-cost=XXX instead of spark.kubernetes.executor.podDeletionCost=XXX? Or, pod template?

dongjoon-hyun added a commit that referenced this pull request Feb 11, 2026
…onCost` config doc

### What changes were proposed in this pull request?

This is a follow-up to fix `spark.kubernetes.executor.podDeletionCost` config doc.
- #52867

### Why are the changes needed?

**Apache Spark 4.2.0-preview2**
```
Value to set for the controller.kubernetes.io/pod-deletion-cost annotation when Spark asks a deployment-based allocator to remove executor pods. This helps Kubernetes pick the same pods Spark selected when the deployment scales down.
This should only be enabled when both ConfigEntry(key=spark.kubernetes.allocation.pods.allocator, defaultValue=direct, doc=Allocator to use for pods. Possible values are direct (the default), statefulset, deployment, or a full class name of a class implementing AbstractPodsAllocator. Future version may add Job or replicaset. This is a developer API and may change or be removed at anytime., public=true, version=3.3.0) is set to deployment, and ConfigEntry(key=spark.dynamicAllocation.enabled, defaultValue=false, doc=, public=true, version=1.2.0) is enabled.
```

**THIS PR**
```
Value to set for the controller.kubernetes.io/pod-deletion-cost annotation when Spark asks a deployment-based allocator to remove executor pods. This helps Kubernetes pick the same pods Spark selected when the deployment scales down.
This should only be enabled when both spark.kubernetes.allocation.pods.allocator is set to deployment, and spark.dynamicAllocation.enabled is enabled.
```

### Does this PR introduce _any_ user-facing change?

No this is a new feature of Spark 4.2.0 which is not released yet officially.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3 Pro (High)` on `Antigravity`

Closes #54271 from dongjoon-hyun/SPARK-54173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
…onCost` config doc

### What changes were proposed in this pull request?

This is a follow-up to fix `spark.kubernetes.executor.podDeletionCost` config doc.
- apache#52867

### Why are the changes needed?

**Apache Spark 4.2.0-preview2**
```
Value to set for the controller.kubernetes.io/pod-deletion-cost annotation when Spark asks a deployment-based allocator to remove executor pods. This helps Kubernetes pick the same pods Spark selected when the deployment scales down.
This should only be enabled when both ConfigEntry(key=spark.kubernetes.allocation.pods.allocator, defaultValue=direct, doc=Allocator to use for pods. Possible values are direct (the default), statefulset, deployment, or a full class name of a class implementing AbstractPodsAllocator. Future version may add Job or replicaset. This is a developer API and may change or be removed at anytime., public=true, version=3.3.0) is set to deployment, and ConfigEntry(key=spark.dynamicAllocation.enabled, defaultValue=false, doc=, public=true, version=1.2.0) is enabled.
```

**THIS PR**
```
Value to set for the controller.kubernetes.io/pod-deletion-cost annotation when Spark asks a deployment-based allocator to remove executor pods. This helps Kubernetes pick the same pods Spark selected when the deployment scales down.
This should only be enabled when both spark.kubernetes.allocation.pods.allocator is set to deployment, and spark.dynamicAllocation.enabled is enabled.
```

### Does this PR introduce _any_ user-facing change?

No this is a new feature of Spark 4.2.0 which is not released yet officially.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3 Pro (High)` on `Antigravity`

Closes apache#54271 from dongjoon-hyun/SPARK-54173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 27, 2026
### What changes were proposed in this pull request?

This PR aims to make `AbstractPodsAllocator` documentation up-to-date:
- Use `Javadoc` style instead of `Scaladoc` according to the [Apache Spark Code Style Guide](https://spark.apache.org/contributing.html).
- Add `Since` annotation for the class and methods.
- Update the class documentation with new `DeploymentPodsAllocator`.
- Update the method documentation with `since` tag.

### Why are the changes needed?

`AbstractPodsAllocator` is one of `DeveloperApi` of the Apache Spark.

https://github.com/apache/spark/blob/4da26e4bf9eab119ff2489c5fdf85efe60f6f469/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/AbstractPodsAllocator.scala#L36-L37

Recently, Apache Spark 4.2.0
- Added a new built-in implementation, `DeploymentPodsAllocator`.
  - #52867
- Added new API, `setExecutorPodsLifecycleManager`.
  - #53840

### Does this PR introduce _any_ user-facing change?

No, this changes documentations only.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3.1 Pro (High)` on `Antigravity`

Closes #54526 from dongjoon-hyun/SPARK-55725.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request May 7, 2026
### What changes were proposed in this pull request?

Add `examples/pi-preview-deployment.yaml`, a SparkPi example that uses `spark.kubernetes.allocation.pods.allocator=deployment`.

### Why are the changes needed?

To provide a ready-to-apply example demonstrating the `deployment`-based pod allocator alongside the upcoming `4.2.0-preview5` release.

- apache/spark#52867

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested.

```
$ kubectl apply -f examples/pi-preview-deployment.yaml
sparkapplication.spark.apache.org/pi-preview-deployment created

$ kubectl get sparkapp
NAME                    AGE   CURRENT STATE
pi-preview-deployment   3s    DriverRequested

$ kubectl get sparkapp
NAME                    AGE   CURRENT STATE
pi-preview-deployment   11s   RunningHealthy

$ kubectl get deploy
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
spark-d-pi-preview-deployment-0-0   3/3     3            3           16s
spark-kubernetes-operator           1/1     1            1           3d4h

$ kubectl get pod
NAME                                                 READY   STATUS    RESTARTS      AGE
pi-preview-deployment-0-driver                       1/1     Running   0             23s
spark-d-pi-preview-deployment-0-0-7cc95db6bb-8dzln   1/1     Running   0             20s
spark-d-pi-preview-deployment-0-0-7cc95db6bb-hhfqn   1/1     Running   0             20s
spark-d-pi-preview-deployment-0-0-7cc95db6bb-pglpb   1/1     Running   0             20s
spark-kubernetes-operator-584498648c-pxfpc           1/1     Running   4 (46h ago)   3d4h
```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Closes #668 from dongjoon-hyun/SPARK-56787.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants