Align timers and timing metrics (ms) across all metrics loggers#39908
Align timers and timing metrics (ms) across all metrics loggers#39908potiuk merged 9 commits intoapache:mainfrom
Conversation
b567276 to
9a5b95e
Compare
potiuk
left a comment
There was a problem hiding this comment.
That's likely very wrong as it is heavily backwards compatible. It will suddenly change almost all dashboards and alerts people built using those metrics.
Metrics are definitely "public" API of Airflow and we cannot change them without breaking compatibility.
100% -- this will break all the operational dashboards for companies, users and service providers |
|
Yes. But making things consistent with docs is one thing, while breaking existing deployments that baase on "actual" behaviour is another. In this case changing the metrics is a problem and if we want to make documentattion and implementation consistent, we should only update the documentation (even if it means that some metrics will have different and inconsistent behaviour). In this case - preserving current behaviour trumps consistency. I think we could have a new parameter that will say "consistent" metrics (disabled by default) where this might get fixed and we should swap the default value with Airflow 3 (to be consistent by default). Simply metrics is one of those things that is undervalued by most people who develop Airlfow but have the potential of wreakiing huge havoc if we change their behaviour, it's heavily public API when you think of breaking effects it might have and probability it will cause problems if you change metric values effectively by 3 orders of magnitude. |
|
Yeah, I'm with @potiuk and @kaxil here. We already ran around in circles on this topic in the previous seconds/milliseconds metrics PR just a couple weeks ago. It's a mess, but breaking everyone's dashboards isn't a viable solution IMHO. To add to that, even if we do standardize them, why break two (Datadog and Otel) to match one (StatsD)? If we were going to do this, shouldn't we change the one that doesn't match the other two? This feels like an Airflow3 "big breaking thing" to me. |
9a5b95e to
6bed534
Compare
|
I think if we want to implement this consistency, it should be Airlfow 3 feature to have it on by default and possibly we should raise a future deprecation warning informing that it will change in Airflow 3 and that user can switch to consistent metrics by flipping that config. @ferruzzi - WDYT ? Also if we do that - I think that should be added to our Airflow 3 discussions as a topic to discuss - https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3+Dev+call%3A+Meeting+Notes - > maybe you can add it it as command and make sure this is included in our Airflow 3 call at some point in time @dirrao ? |
|
WHY: This is generally a I think it would be good to have someone who will be "championing" that - i.e. making sure it's remembered, properly docuemented, good warnings raised and people are aware of this as incompatibility - each of the "workstreams" there will have a lead/champion so having someone who will be advocating and leading that through Airlfow 3 transition (with all the consequences/documentation/making sure that it's properly communicated to our users and that there is a transition plam, including some guidelines of what our users should do) is a good idea. |
Good idea. I have updated the warning. So, users can take a call on it when they switching to Airflow 3. warning: "Timer and timing metrics publish in seconds were deprecated. It is enabled by default from Airflow 3 onwards. Enable metrics consistency to publish all the timer and timing metrics in milliseconds." |
I don't have the access to wiki. I will remind you on this. |
If you want access, send me your email in Slack |
|
looks gpod (sorry for delay) |
|
@potiuk |
|
Hmm.. seing it - I am not sure if warning is a good idea. The thing with warnings is that you should have an "easy" way to get rid of the warning - and changing your dashboard/metrics is likely not an easy task - and I think one of the worst cases with warnings is to not allow people to silence them easily (by fixing them) . I am just thiking whether we should allow a third state of the switch ("legacy" ?) to allow the users to deliberately set it and silence the warning ? BTW. The way how to treat warnings in tests is described in the docs: https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#handling-warnings |
I think in 4.0 - but also I think we will get to much faster "MAJOR" release cycle. We have not decided it yet, but we might want to do just that and say release new Airflow MAJOR release every year ? |
| version_added: 2.10.0 | ||
| type: string | ||
| example: ~ | ||
| default: "True" |
There was a problem hiding this comment.
!!!! @potiuk I guess no one noticed, cos this made it out with a default of on.
There was a problem hiding this comment.
Oh, except this hasn't made it to a release yet, and it wasn't marked for backporting. (Lucky miss there)
There was a problem hiding this comment.
Didn't slip through, it was discussed above and intentional.
There was a problem hiding this comment.
We agreed to enable it directly in Airflow 3 without backporting. However, updating this was overlooked.
(cherry picked from commit 08daffe)
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: #39908 introduced milliseconds alignment with a compatibility flag, and #43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: #39908 introduced milliseconds alignment with a compatibility flag, and #43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 10dcb3a)
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: #39908 introduced milliseconds alignment with a compatibility flag, and #43975 removed that flag and standardized milliseconds everywhere. (cherry picked from commit 10dcb3a) Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: #39908 introduced milliseconds alignment with a compatibility flag, and #43975 removed that flag and standardized milliseconds everywhere. (cherry picked from commit 10dcb3a) Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>
Update timer.duration labels and examples to milliseconds, and fix the OTel timing docstring to match runtime behavior. Historical context: apache#39908 introduced milliseconds alignment with a compatibility flag, and apache#43975 removed that flag and standardized milliseconds everywhere. Co-authored-by: Cursor <cursoragent@cursor.com>

Align timers and timing metrics (ms) across all metrics loggers
closed: #39818