-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Scheduler querying dag_run and task_instance tables filtering by span_status #53154
Copy link
Copy link
Open
Labels
area:Schedulerincluding HA (high availability) schedulerincluding HA (high availability) schedulerarea:corearea:performancekind:bugThis is a clearly a bugThis is a clearly a bug
Description
Apache Airflow version
3.0.2
If "Other Airflow 2 version" selected, which one?
No response
What happened?
After upgrading to Airflow v3, we are noticing that Airflow Scheduler is continuously querying dag_run and task_instance tables with .... WHERE span_status = SHOULD_END. Since there is no index created for span_status, it means all the rows in these two tables are being scanned continuously. And since both these tables contain historical runs, these are pretty big and causing performance issues.
I still have to dig deeper, if this is related, but "Task Instances" tab takes minutes to load.
What you think should happen instead?
- We should create an index on "span_status" column.
- Also, this column name is confusing because it will be used even if
otelis disabled.
How to reproduce
Airflow Scheduler is continuously looping as seen from the code.
Operating System
Docker
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
Running as docker container
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:Schedulerincluding HA (high availability) schedulerincluding HA (high availability) schedulerarea:corearea:performancekind:bugThis is a clearly a bugThis is a clearly a bug