Skip to content

Decouple "job runner" from BaseJob ORM model#30255

Merged
potiuk merged 2 commits intoapache:mainfrom
potiuk:decouple_execute_from_job
Apr 10, 2023
Merged

Decouple "job runner" from BaseJob ORM model#30255
potiuk merged 2 commits intoapache:mainfrom
potiuk:decouple_execute_from_job

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Mar 23, 2023

Originally BaseJob ORM model was extended and Polymorphism has been used to tie different execution logic to different job types. This has proven to be difficult to handle during AIP-44 implementation (internal API) because LocalTaskJob, DagProcessorJob and TriggererJob are all going to not use the ORM BaseJob model, but they should use BaseJobPydantic instead. In order to make it possible, we introduce a new type of object BaseJobRunner and make BaseJob use the runners instead.

This way, the BaseJobRunners are used for the logic of each of the job, where single, non-polimorphic BaseJob is used to keep the records in the database - as a follow up it will allow to completely decouple the job database operations and move it to internal_api component when db-lesss mode is enabled.

Closes: #30294


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Mar 23, 2023
@potiuk potiuk requested review from Taragolis and mhenc March 23, 2023 16:17
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Mar 23, 2023

cc: @vincbeck

@potiuk potiuk added the AIP-44 Airflow Internal API label Mar 23, 2023
Copy link
Copy Markdown
Member Author

@potiuk potiuk Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be old and unused/unusable test class.

@potiuk potiuk requested a review from uranusjr March 23, 2023 16:40
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Mar 23, 2023

Hello Everyone, I've been working on it on-off for quite a while as I attempted to make things work for the AIP-44. I actually attempted to do in a few different ways and I did not like the previous ones. Ths one I think is at the sweet-spot of solving the (a little) intertwined BaseJob dependencies with the need of decoupling of the Task running from the ORM models..

A little context on that one.

LocalTaskJob as we have it curently implemented inherits from BaseJob (same as other jobs). It is in fact a polimorphic dependency - all of the jobs are stored in the same 'BaseJob' table. This is (and always has been) a little problematic - because the ''Job" objects inherit from the ORM object and there is an assumption that they are DB-related, on the other had they also had the "running" logic implemented in _execute method of the same ORM-derived objects. That made it rather dificult to split out the logic from database - for TriggererJob, LocalTaskJob and DagFileProcessorJob especially.

I attempted to do it in various ways but I had the goals:

a) the resulting architecture wil decouple from the ORM object from the logic (so that we could have serialized Pydantic objects introduced in #29776 used instead (so basically we should be able to pass and use BaseJob and BaseJobPydantic around)

b) it shoud touch as little logic change as possible (basically shuffling around objects and calling different objects was most of the changes I wanted to do) - so that it will be easy to review and reason about.

c) the resulting architecture will make sense

Result

I think I finally achieved all three goals. Summarizing of what has been done here:

  • I renamed the *Jobs (LocalTaskJob, SchedulerJob etc. to JobRunners which are NOT ORM objects - the whole logic is kept in there. I am just passing the job (BaseJob or BaseJobPydantic) as property of those JobRunners, and I changed all the calls to the internal job fields ot call that job property.

  • where needed (heartbeat mostly) I extracted the logic of updating the job so that it will save changed heartbeat to database even if JobBasePydantic is used

  • we do not have any longer polimorphic BaseJob - BaseJob is just regular ORM class that keeps records for all the runners - uses the same job "types" as before so it is backwards-compatible - but simpler. All the places where polimorphism was used have been updated.

  • I think the resulting architecture is quite a bit easier to reason about as it effectively allows to decouple DB operations to manage state and actual job done by a running task.

Current state

It looks like a huge change but if you look closly most of the changes are changes in tests to adapt to the new object hierarchy. So I hope the review will not be that difficult.

I still have a few (heartbeat) tests not passing and I am working on those (likely something missing in heartbeat processing) - but other than those, I think everything else is in place.

Future

Now - this is not YET AIP-44-compatible change. This refactor is just a basic decoupling.

We will need to implement several other follow up changes after this one is merged:

  • separate DB/non-DB operations and turn the BaseJob's run() into a standalone method that will be possible to run with/without DB
  • (possibly) move the JobRunners to a separate package (or rename modules) to not confuse them with BaseJob - they are quite a different beast
  • decorate the DB methods with @internal_api calls to make them AIP-44 ready

Follow ups

I am not sure of that but those changes also make it possible to do something else. Namely they allow us to limit for how long connections are opened from running tasks. Previously we kept them open all the time when the task was running and that was kinda strange as in most circumstances we only needed it to do some initial setup, heartbeat and save job state when complete. I think this change will enable something else (but that's something to see when the other changes are completed - we could optimize that away and (mimicking what Internal API will be doing) we could only get the session/connections established for short times by the running task. I hope we can get there.

Looking forward to comments and feedback.

BTW. It's I think impossible to split this PR into smaller ones :(

BTW2. DON'T be scared about the size of hte change. It's not as big as it seems - it just needed a lot of changes in tests.

The "code" changes:

+355 -219 (21 files)

The test changes:

+1142 -1032 (28 files)

So it is not that big.

@potiuk potiuk requested a review from o-nikolas March 23, 2023 16:51
Copy link
Copy Markdown
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice decoupling! I really like that. That would make things easier for AIP-44. I only have 2 nits, the decoupling + the code make sense to me.

PS: +1 on the "it is easier to review than it seems"

@potiuk potiuk force-pushed the decouple_execute_from_job branch from 5ba89fb to 242c9ef Compare March 23, 2023 19:43
@potiuk potiuk force-pushed the decouple_execute_from_job branch from 1d54762 to 1c59f34 Compare April 6, 2023 06:50
@potiuk potiuk requested a review from bolkedebruin as a code owner April 6, 2023 06:50
@potiuk potiuk force-pushed the decouple_execute_from_job branch from 1c59f34 to 5935581 Compare April 6, 2023 12:44
@potiuk potiuk requested a review from eladkal as a code owner April 6, 2023 12:44
@potiuk potiuk force-pushed the decouple_execute_from_job branch from 5935581 to d7f8b62 Compare April 7, 2023 10:31
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Apr 7, 2023

All resolved, pushed a new version rebasing on top of the typing change #30503

Copy link
Copy Markdown
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good!

Originally BaseJob ORM model was extended and Polymorphism has been
used to tie different execution logic to different job types. This
has proven to be difficult to handle during AIP-44 implementation
(internal API) because LocalTaskJob, DagProcessorJob and TriggererJob
are all going to not use the ORM BaseJob model, but they should use
BaseJobPydantic instead. In order to make it possible, we introduce
a new type of object BaseJobRunner and make BaseJob use the runners
instead.

This way, the BaseJobRunners are used for the logic of each of the
job, where single, non-polimorphic BaseJob is used to keep the
records in the database - as a follow up it will allow to completely
decouple the job database operations and move it to internal_api
component when db-lesss mode is enabled.

Closes: apache#30294
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Apr 9, 2023

I think it shoudl be ready for the next pass.

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Apr 9, 2023

Yep. Came out green.

Comment on lines +588 to +592
# we cannot (for now) define job in _job_runner nicely due to circular references of
# job and job runner, so we have to use getattr, but we might address it in the future
# change when decoupling these two even more
if getattr(self._job_runner, "job", None) is not None:
perform_heartbeat(self._job_runner.job, only_if_necessary=False)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a refactoring idea but that can wait until after this is merged and doesn’t need to block your other PRs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted in astronomer@68d5231

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will not need it at the end -> when we complet the refactoring (final state is #30376), this line is gone, we have job defined individually in each *JobRunner and we always know which runner we access, so there is no need to add the typeguard I think

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Apr 10, 2023

Wooohooooo ! now time for the next stages :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-44 Airflow Internal API area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

No open projects
Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

AIP-44 decouple BaseJob ORM from Job logic

7 participants