AIP-72: Port Registering of Asset Changes to Task SDK on task completion#45924
Merged
amoghrajesh merged 15 commits intoapache:mainfrom Jan 24, 2025
Merged
AIP-72: Port Registering of Asset Changes to Task SDK on task completion#45924amoghrajesh merged 15 commits intoapache:mainfrom
amoghrajesh merged 15 commits intoapache:mainfrom
Conversation
Contributor
Author
|
Working on fixing the tests |
amoghrajesh
commented
Jan 22, 2025
ashb
reviewed
Jan 22, 2025
ashb
reviewed
Jan 22, 2025
ashb
reviewed
Jan 22, 2025
amoghrajesh
commented
Jan 23, 2025
Contributor
Author
|
With the new changes, tested both for legacy and task sdk DAGs (cc @ashb) Task SDK Results: |
Contributor
Author
|
Looks like the changes in this PR break some test cases for inlet_events. Looking into it |
Contributor
Author
|
Legacy without my changes: Legacy with my changes: Which is the same, so likely its a change needed in the test cases. |
ashb
reviewed
Jan 23, 2025
Contributor
Author
|
Ok I think I figured out the reason for failure, working on a fix |
ashb
reviewed
Jan 23, 2025
Member
ashb
left a comment
There was a problem hiding this comment.
Looking better. A few tidy ups to do but def on the right track.
ashb
approved these changes
Jan 23, 2025
Contributor
Author
|
I will just resolve the conversations with relevant replies, rebase & merge this |
gpathak128
pushed a commit
to gpathak128/airflow
that referenced
this pull request
Jan 29, 2025
got686-yandex
pushed a commit
to got686-yandex/airflow
that referenced
this pull request
Jan 30, 2025
niklasr22
pushed a commit
to niklasr22/airflow
that referenced
this pull request
Feb 8, 2025
ambika-garg
pushed a commit
to ambika-garg/airflow
that referenced
this pull request
Feb 17, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


closes: #45752
Why?
When a task completed (moves into success) state, asset related events for that particular task should be stores into the metadata DB. Currently it is done like this: airflow/airflow/models/taskinstance.py.
This does a few things:
assetasset_aliasasset_alias_assetasset_alias_asset_eventasset_eventThis behaviour needs to be ported into the task sdk.
Approach
The idea is to implement this logic in the
patchti state endpoint in the execution API server. The reasoning is so that we needn't make an additional API call but when "finishing" a task from the task runner, we can send in the relevant details like the task outlets, the outlet events and take care of the rest inti_update_stateendpoint.Interface changes
TISuccessStatePayloadto mark a state as success from the task runner in the execution API -> reason being, we do not want to coagulate theTITerminalStatePayloadwith additional information slowing down the API request for no need.task_outlets: translates toti.task.outletsat execution time sent from the task runneroutlet_events: these are the events for aoutletobject. For example, forAssetsit translates tocontext["outlet_events"][Asset Object]asset_type: we send the type of object (class name) the exeuction API has to deal with. This is to avoid any additional mental gymnastics on the server side to find the kind of object we are dealing with as it will be serialised.Server Side
ti_update_stateadded an additional branch for success state where we callregister_asset_changeswhich is a similar function toairflow/airflow/models/taskinstance.py
Line 359 in 051e617
task_outlet, it registers the events for different types:tuple[asset uri, extra], and generates events for those by handling some cases. Docs: https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html#how-to-use-datasetaliasExecution side
Task Runner
The task runner runs the task as usual and before finishing, checks for task outlets defined. If task outlets are defined, it populates the
task_outlets,outlet_eventsandasset_typefor the outlets present.For
Asset:task_outletsandoutlet_eventsasevents[obj]whereevents = context["outlet_events"]For
AssetNameRefandAssetUriRef:task_outletsas needed and populates all the events possible as we cannot access the DB to get the model being referencedFor
AssetAlias:We dont care about the
task_outlets, we only care about theasset_alias_events, so those are populated inoutlet_eventsOnce this is done, a
SucceedTaskis sent to supervisor.Supervisor
Supervisor starts treating the success state as
STATES_SENT_DIRECTLYfrom now. Once it receives aSuceedTaskmessage from the task runner, it callsHTTP client in task sdk
Introduced a new method called:
succeedwhich will call theti_patch_stateapi withTISuccessStatePayloadTesting
Using the DAG: https://github.com/apache/airflow/blob/main/airflow/example_dags/example_asset_alias.py
DAGS:

Non aliases assets
Event:
Consumer DAG also gets triggered:

Aliases assets
Event:
This triggers two DAGs now: asset_alias_example_alias_consumer and asset_s3_bucket_consumer since the aliased asset was updated
Fails due to

inlet_eventsstill not being ported:DB level checks
Asset created:

Asset Alias:

Asset Alias Asset mapping:

Asset event: (first one triggered by asset, second by alias)

Alias Event mapping:

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.