Prevent transient error in case when Pod start_time parameter is None#59097
Conversation
a4f703d to
951ae12
Compare
951ae12 to
5994234
Compare
|
Not sure why you are calling me - that's not my area of expertise. Generally what we prefer is when you are pinging to attract maintainer to review - ping "in general" "Hey can I get a review" - generally speaking when I answer to your ping now that "this is not my area of expertise" - this will significantly decrease your chances of getting help from some other maintainer, because they will see I was involved (and they will not even see that I responded "not my area of interest". But .. you made your choice. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a transient error in the Kubernetes provider that occurs when duplicate pods are spawned but their pod.status.start_time parameter is not yet set. The fix adds a fallback mechanism to use creation_timestamp from pod metadata when start_time is unavailable.
Key changes:
- Modified
_get_most_recent_pod_indexmethod to fallback tocreation_timestampwhenstart_timeis None - Added two new test cases to verify the fallback behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py | Added fallback logic to use creation_timestamp when start_time is None in the _get_most_recent_pod_index method |
| providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py | Added two test cases: one testing fallback to creation_timestamp when start_time is None, and another testing behavior when both are None |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Outdated
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Not my area of expertise either (yet) - I've assigned Copilot for an initial review, so please address its commentary as well.
However, my 2 cents, if I understand these changes correctly - the if block makes the comparison based on a creation_timestamp, which is clearly not the start_time. Therefore, assigning the creation_timestamp to pod_start_times in the if statement is rather misleading.
In that case, why not making the entire comparison based on creation_timestamp, replacing the start_time completely? Alternatively, if comparison based on start_time is useful for most cases, maybe you could introduce a flag for the user so they'll configure the comparison method?
I'll be happy for an additional review from k8s owners.
|
Thank you for the comments. I tagged you because I saw that Jarek previously participated in Kubernetes provider PRs approvals, and Shahar approved my previous PR (#49899). While I understand your point, please note that the auto-assigned reviewers haven't reviewed any PRs in the last 6 months. I tagged you because the standard process didn't seem to be working. Since I mostly contribute to the Google provider, I assumed tagging active maintainers was the right move to unblock the PR. I wasn't trying to be disruptive, just trying to get the code reviewed. So I apologize if I didn't follow the specific protocol for Kubernetes provider. I'll keep this in mind for next time. |
I am not sure where you get that statistics from, but maybe try to ping those auto-assigned reviewers. You have not tried it yet and seems that they are best suited. Maybe they don't realise you wait for their review? |
|
Hello @potiuk ! Yes, my mistake, I checked manually and missed some of the PR´s reviewed by the code owners. I apologize for the false statement. Thank you for pointing it out. I will ping them in a separate comment. |
|
Hello @hussein-awala @jedcunningham could you please help with the review of the PR? Thank you! |
Hello @shahar1 thank you for your comment! My thought was about extending current logic not changing it completely. Also, if it is possible @jscheffl could you please help with the review of the PR? Or advise someone who has expertise to review and approve this PR? |
5994234 to
6c3d442
Compare
Luckily I've started recently working on a CKA certification, so I feel more comfortable reviewing this PR. Please avoid tagging other commiters/PMC from now - let's try settling this one between us. If anyone else wants to review or comment, they are more than welcome, of course. For future PRs, please be aware of matter - as Jarek said, arbitrarily tagging maintainers might "taint" your current and future PRs from review (k8s pun intended). For better visibility in future PRs, I recommend sending a message in the Now, let's get to what needs to be done:
Please take care of the two comments above, and from my prespective it will be good to merge. |
6c3d442 to
397d339
Compare
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Show resolved
Hide resolved
5de9379 to
2fae66e
Compare
|
LGTM, improved the phrasing of the log message and rebased. |
There is a chance of a transient error inside the Kubernetes provider when two identical pods are spawned but the pod.status.start_time parameter is not set yet. To prevent this situation, an additional check of creation_timestamp was added. It is a more convenient step to compare identical pods based on the time when etcd receives information about the pod rather than when the scheduler sends information to the kubelet.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.