Skip to content

Scheduler querying dag_run and task_instance tables filtering by span_status #53154

@amit-mittal

Description

@amit-mittal

Apache Airflow version

3.0.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

After upgrading to Airflow v3, we are noticing that Airflow Scheduler is continuously querying dag_run and task_instance tables with .... WHERE span_status = SHOULD_END. Since there is no index created for span_status, it means all the rows in these two tables are being scanned continuously. And since both these tables contain historical runs, these are pretty big and causing performance issues.

I still have to dig deeper, if this is related, but "Task Instances" tab takes minutes to load.

What you think should happen instead?

  • We should create an index on "span_status" column.
  • Also, this column name is confusing because it will be used even if otel is disabled.

How to reproduce

Airflow Scheduler is continuously looping as seen from the code.

Operating System

Docker

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

Running as docker container

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions