fix(glue): Fix GlueJobOperator verbose logs not showing in deferrable mode#63086
Merged
o-nikolas merged 8 commits intoapache:mainfrom Mar 27, 2026
Merged
Conversation
1ece990 to
ed4abf2
Compare
shivaam
commented
Mar 8, 2026
shivaam
commented
Mar 8, 2026
ed4abf2 to
3d66e21
Compare
eladkal
approved these changes
Mar 10, 2026
Contributor
|
Static checks are failing for glue.py file. Please run prek locally in branch and commit. |
Member
|
@shivaam This PR has been converted to draft because it does not yet meet our Pull Request quality criteria. Issues found:
What to do next:
Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. If you have questions, feel free to ask on the Airflow Slack. |
4b06866 to
03dce1c
Compare
eladkal
reviewed
Mar 11, 2026
providers/amazon/src/airflow/providers/amazon/aws/triggers/glue.py
Outdated
Show resolved
Hide resolved
o-nikolas
reviewed
Mar 12, 2026
…bOperator When using GlueJobOperator with deferrable=True and verbose=True, CloudWatch logs were silently ignored because the trigger inherited the base waiter's run() method which only polls job status. This adds a run() override and _forward_logs() helper to the GlueJobCompleteTrigger that streams logs from both output and error CloudWatch log groups, matching the format used by the synchronous path. closes: apache#56535
Removed docstring from fetch_logs method and added a comment.
4d9c451 to
211a3d5
Compare
Extract get_glue_log_group_names() and format_glue_logs() as module-level helpers in hooks/glue.py so that GlueJobHook.print_job_logs (sync) and GlueJobCompleteTrigger._forward_logs (async) share identical log formatting and log group name extraction logic.
o-nikolas
approved these changes
Mar 27, 2026
o-nikolas
added a commit
to aws-mwaa/upstream-to-airflow
that referenced
this pull request
Mar 28, 2026
…ferrable mode (apache#63086)" This reverts commit b086a22.
Merged
1 task
1 task
nailo2c
pushed a commit
to nailo2c/airflow
that referenced
this pull request
Mar 30, 2026
… mode (apache#63086) * fix(glue): Add verbose CloudWatch log streaming for deferrable GlueJobOperator When using GlueJobOperator with deferrable=True and verbose=True, CloudWatch logs were silently ignored because the trigger inherited the base waiter's run() method which only polls job status. This adds a run() override and _forward_logs() helper to the GlueJobCompleteTrigger that streams logs from both output and error CloudWatch log groups, matching the format used by the synchronous path Extract get_glue_log_group_names() and format_glue_logs() as module-level helpers in hooks/glue.py so that GlueJobHook.print_job_logs (sync) and GlueJobCompleteTrigger._forward_logs (async) share identical log formatting and log group name extraction logic. closes: apache#56535 --------- Co-authored-by: Elad Kalif <45845474+eladkal@users.noreply.github.com>
nailo2c
pushed a commit
to nailo2c/airflow
that referenced
this pull request
Mar 30, 2026
…ferrable mode (apache#63086)" (apache#64340) This reverts commit b086a22.
Suraj-kumar00
pushed a commit
to Suraj-kumar00/airflow
that referenced
this pull request
Apr 7, 2026
… mode (apache#63086) * fix(glue): Add verbose CloudWatch log streaming for deferrable GlueJobOperator When using GlueJobOperator with deferrable=True and verbose=True, CloudWatch logs were silently ignored because the trigger inherited the base waiter's run() method which only polls job status. This adds a run() override and _forward_logs() helper to the GlueJobCompleteTrigger that streams logs from both output and error CloudWatch log groups, matching the format used by the synchronous path Extract get_glue_log_group_names() and format_glue_logs() as module-level helpers in hooks/glue.py so that GlueJobHook.print_job_logs (sync) and GlueJobCompleteTrigger._forward_logs (async) share identical log formatting and log group name extraction logic. closes: apache#56535 --------- Co-authored-by: Elad Kalif <45845474+eladkal@users.noreply.github.com>
Suraj-kumar00
pushed a commit
to Suraj-kumar00/airflow
that referenced
this pull request
Apr 7, 2026
…ferrable mode (apache#63086)" (apache#64340) This reverts commit b086a22.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
When using
GlueJobOperator(deferrable=True, verbose=True), CloudWatch logs are silently ignored. Theverboseflag is stored in the trigger but never read — the inheritedAwsBaseWaiterTrigger.run()only polls job status via a boto3 waiter and has no log-fetching logic.This means users who switch from
deferrable=Falsetodeferrable=Truelose all verbose CloudWatch log output with no warning.closes: #56535
What
Added a
run()override and_forward_logs()helper toGlueJobCompleteTriggerintriggers/glue.py:verbose=False: delegates tosuper().run()— zero behavior change.verbose=True: custom async poll loop that checks job state and streams logs from both/outputand/errorCloudWatch log groups usingget_log_eventswith continuation tokens.GlueJobHook.print_job_logs): tab-indented lines prefixed withGlue Job Run <log_group> Logs:, andNo new log from the Glue Job in <log_group>when idle.How
Follows the same pattern as the ECS
TaskDoneTrigger._forward_logs()which already does async CloudWatch log tailing in this codebase. Usesget_log_events(async, token-based) instead of the sync path'sfilter_log_events(paginator-based), but produces identical user-facing output.Testing
Sync task output:
Deferrable task output (after fix):
The deferrable path batches more steps per poll cycle (30s vs 6s polling interval) but the format is now consistent.
Was generative AI tooling used to co-author this PR?