Skip to content

Use the new Metrics module for core metrics#10175

Merged
zwoop merged 11 commits intoapache:masterfrom
zwoop:TryNewMetrics
Sep 14, 2023
Merged

Use the new Metrics module for core metrics#10175
zwoop merged 11 commits intoapache:masterfrom
zwoop:TryNewMetrics

Conversation

@zwoop
Copy link
Contributor

@zwoop zwoop commented Aug 13, 2023

This is WIP, to see if it works, and if so, how well. Also looking for feedback on the format. I've only done the http_rsb (HTTP stats) so far, which is a bit less than half of all the core metrics.

@zwoop zwoop added the Metrics label Aug 13, 2023
@zwoop zwoop added this to the 10.0.0 milestone Aug 13, 2023
@zwoop zwoop self-assigned this Aug 13, 2023
@zwoop zwoop marked this pull request as draft August 13, 2023 06:11
@zwoop
Copy link
Contributor Author

zwoop commented Aug 13, 2023

Doing some quick benchmarks on this, even though this only changes less than half of the metrics, I'm seeing this running at around 1.4M RPS:

  • 9% less CPU used
  • 8% less TTFB (mean) and 16% less TTFB (max), latency is more consistent

Even with a grain of salt, I think it may be worth exploring this further and see what we get. In addition to slightly better performance, this eliminates a lot of complex code in the core as well.

@zwoop zwoop force-pushed the TryNewMetrics branch 13 times, most recently from 5c5cb5e to 0f93531 Compare August 21, 2023 16:16
@zwoop zwoop force-pushed the TryNewMetrics branch 3 times, most recently from ac0ba24 to e41577e Compare August 27, 2023 23:38
@zwoop zwoop force-pushed the TryNewMetrics branch 9 times, most recently from 670a69e to 9257048 Compare September 6, 2023 03:05
@zwoop zwoop force-pushed the TryNewMetrics branch 4 times, most recently from 6ef6ed4 to 3eaa5ab Compare September 12, 2023 21:19
@apache apache deleted a comment from ezelkow1 Sep 12, 2023
@zwoop zwoop marked this pull request as ready for review September 12, 2023 22:23
@zwoop
Copy link
Contributor Author

zwoop commented Sep 12, 2023

[approve ci]

@bryancall
Copy link
Contributor

Differences in stat names (+ have been added and - have been removed):

+proxy.process.cache.aio.KB_read
+proxy.process.cache.aio.KB_write
+proxy.process.cache.aio.read_count
+proxy.process.cache.aio.write_count
+proxy.process.cache.directory.sync.bytes
+proxy.process.cache.directory.sync.count
+proxy.process.cache.directory.sync.time
+proxy.process.cache.directory.wrap.around
-proxy.process.cache.gc_bytes_evacuated
-proxy.process.cache.gc_frags_evacuated
+proxy.process.cache.gc.bytes_evacuated
+proxy.process.cache.gc.frags_evacuated
-proxy.process.cache.sync.bytes
-proxy.process.cache.sync.count
-proxy.process.cache.sync.time
+proxy.process.cache.volume_10.directory.sync.bytes
+proxy.process.cache.volume_10.directory.sync.count
+proxy.process.cache.volume_10.directory.sync.time
+proxy.process.cache.volume_10.directory.wrap.around
-proxy.process.cache.volume_10.gc_bytes_evacuated
-proxy.process.cache.volume_10.gc_frags_evacuated
+proxy.process.cache.volume_10.gc.bytes_evacuated
+proxy.process.cache.volume_10.gc.frags_evacuated
-proxy.process.cache.volume_10.sync.bytes
-proxy.process.cache.volume_10.sync.count
-proxy.process.cache.volume_10.sync.time
-proxy.process.cache.volume_10.wrap_count
-proxy.process.cache.volume_10.write_bytes_stat
+proxy.process.cache.volume_10.write_bytes
+proxy.process.cache.volume_1.directory.sync.bytes
+proxy.process.cache.volume_1.directory.sync.count
+proxy.process.cache.volume_1.directory.sync.time
+proxy.process.cache.volume_1.directory.wrap.around
-proxy.process.cache.volume_1.gc_bytes_evacuated
-proxy.process.cache.volume_1.gc_frags_evacuated
+proxy.process.cache.volume_1.gc.bytes_evacuated
+proxy.process.cache.volume_1.gc.frags_evacuated
-proxy.process.cache.volume_1.sync.bytes
-proxy.process.cache.volume_1.sync.count
-proxy.process.cache.volume_1.sync.time
-proxy.process.cache.volume_1.wrap_count
-proxy.process.cache.volume_1.write_bytes_stat
+proxy.process.cache.volume_1.write_bytes
+proxy.process.cache.volume_2.directory.sync.bytes
+proxy.process.cache.volume_2.directory.sync.count
+proxy.process.cache.volume_2.directory.sync.time
+proxy.process.cache.volume_2.directory.wrap.around
-proxy.process.cache.volume_2.gc_bytes_evacuated
-proxy.process.cache.volume_2.gc_frags_evacuated
+proxy.process.cache.volume_2.gc.bytes_evacuated
+proxy.process.cache.volume_2.gc.frags_evacuated
-proxy.process.cache.volume_2.sync.bytes
-proxy.process.cache.volume_2.sync.count
-proxy.process.cache.volume_2.sync.time
-proxy.process.cache.volume_2.wrap_count
-proxy.process.cache.volume_2.write_bytes_stat
+proxy.process.cache.volume_2.write_bytes
+proxy.process.cache.volume_3.directory.sync.bytes
+proxy.process.cache.volume_3.directory.sync.count
+proxy.process.cache.volume_3.directory.sync.time
+proxy.process.cache.volume_3.directory.wrap.around
-proxy.process.cache.volume_3.gc_bytes_evacuated
-proxy.process.cache.volume_3.gc_frags_evacuated
+proxy.process.cache.volume_3.gc.bytes_evacuated
+proxy.process.cache.volume_3.gc.frags_evacuated
-proxy.process.cache.volume_3.sync.bytes
-proxy.process.cache.volume_3.sync.count
-proxy.process.cache.volume_3.sync.time
-proxy.process.cache.volume_3.wrap_count
-proxy.process.cache.volume_3.write_bytes_stat
+proxy.process.cache.volume_3.write_bytes
+proxy.process.cache.volume_4.directory.sync.bytes
+proxy.process.cache.volume_4.directory.sync.count
+proxy.process.cache.volume_4.directory.sync.time
+proxy.process.cache.volume_4.directory.wrap.around
-proxy.process.cache.volume_4.gc_bytes_evacuated
-proxy.process.cache.volume_4.gc_frags_evacuated
+proxy.process.cache.volume_4.gc.bytes_evacuated
+proxy.process.cache.volume_4.gc.frags_evacuated
-proxy.process.cache.volume_4.sync.bytes
-proxy.process.cache.volume_4.sync.count
-proxy.process.cache.volume_4.sync.time
-proxy.process.cache.volume_4.wrap_count
-proxy.process.cache.volume_4.write_bytes_stat
+proxy.process.cache.volume_4.write_bytes
+proxy.process.cache.volume_5.directory.sync.bytes
+proxy.process.cache.volume_5.directory.sync.count
+proxy.process.cache.volume_5.directory.sync.time
+proxy.process.cache.volume_5.directory.wrap.around
-proxy.process.cache.volume_5.gc_bytes_evacuated
-proxy.process.cache.volume_5.gc_frags_evacuated
+proxy.process.cache.volume_5.gc.bytes_evacuated
+proxy.process.cache.volume_5.gc.frags_evacuated
-proxy.process.cache.volume_5.sync.bytes
-proxy.process.cache.volume_5.sync.count
-proxy.process.cache.volume_5.sync.time
-proxy.process.cache.volume_5.wrap_count
-proxy.process.cache.volume_5.write_bytes_stat
+proxy.process.cache.volume_5.write_bytes
+proxy.process.cache.volume_6.directory.sync.bytes
+proxy.process.cache.volume_6.directory.sync.count
+proxy.process.cache.volume_6.directory.sync.time
+proxy.process.cache.volume_6.directory.wrap.around
-proxy.process.cache.volume_6.gc_bytes_evacuated
-proxy.process.cache.volume_6.gc_frags_evacuated
+proxy.process.cache.volume_6.gc.bytes_evacuated
+proxy.process.cache.volume_6.gc.frags_evacuated
-proxy.process.cache.volume_6.sync.bytes
-proxy.process.cache.volume_6.sync.count
-proxy.process.cache.volume_6.sync.time
-proxy.process.cache.volume_6.wrap_count
-proxy.process.cache.volume_6.write_bytes_stat
+proxy.process.cache.volume_6.write_bytes
+proxy.process.cache.volume_7.directory.sync.bytes
+proxy.process.cache.volume_7.directory.sync.count
+proxy.process.cache.volume_7.directory.sync.time
+proxy.process.cache.volume_7.directory.wrap.around
-proxy.process.cache.volume_7.gc_bytes_evacuated
-proxy.process.cache.volume_7.gc_frags_evacuated
+proxy.process.cache.volume_7.gc.bytes_evacuated
+proxy.process.cache.volume_7.gc.frags_evacuated
-proxy.process.cache.volume_7.sync.bytes
-proxy.process.cache.volume_7.sync.count
-proxy.process.cache.volume_7.sync.time
-proxy.process.cache.volume_7.wrap_count
-proxy.process.cache.volume_7.write_bytes_stat
+proxy.process.cache.volume_7.write_bytes
+proxy.process.cache.volume_8.directory.sync.bytes
+proxy.process.cache.volume_8.directory.sync.count
+proxy.process.cache.volume_8.directory.sync.time
+proxy.process.cache.volume_8.directory.wrap.around
-proxy.process.cache.volume_8.gc_bytes_evacuated
-proxy.process.cache.volume_8.gc_frags_evacuated
+proxy.process.cache.volume_8.gc.bytes_evacuated
+proxy.process.cache.volume_8.gc.frags_evacuated
-proxy.process.cache.volume_8.sync.bytes
-proxy.process.cache.volume_8.sync.count
-proxy.process.cache.volume_8.sync.time
-proxy.process.cache.volume_8.wrap_count
-proxy.process.cache.volume_8.write_bytes_stat
+proxy.process.cache.volume_8.write_bytes
+proxy.process.cache.volume_9.directory.sync.bytes
+proxy.process.cache.volume_9.directory.sync.count
+proxy.process.cache.volume_9.directory.sync.time
+proxy.process.cache.volume_9.directory.wrap.around
-proxy.process.cache.volume_9.gc_bytes_evacuated
-proxy.process.cache.volume_9.gc_frags_evacuated
+proxy.process.cache.volume_9.gc.bytes_evacuated
+proxy.process.cache.volume_9.gc.frags_evacuated
-proxy.process.cache.volume_9.sync.bytes
-proxy.process.cache.volume_9.sync.count
-proxy.process.cache.volume_9.sync.time
-proxy.process.cache.volume_9.wrap_count
-proxy.process.cache.volume_9.write_bytes_stat
+proxy.process.cache.volume_9.write_bytes
-proxy.process.cache.wrap_count
-proxy.process.cache.write_bytes_stat
+proxy.process.cache.write_bytes
+proxy.process.dns.fail_time
+proxy.process.dns.lookup_time
-proxy.process.dns.success_avg_time
+proxy.process.dns.success_time
+proxy.process.net.read_bytes_count
+proxy.process.net.write_bytes_count

@zwoop zwoop merged commit f23826d into apache:master Sep 14, 2023
@zwoop zwoop deleted the TryNewMetrics branch September 14, 2023 18:19
@zwoop
Copy link
Contributor Author

zwoop commented Sep 14, 2023

Those metrics should not have changed names, maybe you didn't sort them? There's no guarantee that the order will be retained.

@bryancall
Copy link
Contributor

I sorted them

@zwoop
Copy link
Contributor Author

zwoop commented Sep 14, 2023

gah, ok, I can't read, will address the ones that are changed, that's not intentional.

@zwoop
Copy link
Contributor Author

zwoop commented Sep 14, 2023

Ok, the is the updated list of changes, this is as expected since we (per the mailing list) nuked all the calculated average metrics etc., and I added a few more to do the raw counts. There are a few metrics that has a suffix of _stats, which I'm going to fix as well in a separate PR.

root@frigg /opt/ats-10 # diff /tmp/before.txt /tmp/after.txt
45a46,49
> proxy.process.cache.aio.KB_read
> proxy.process.cache.aio.KB_write
> proxy.process.cache.aio.read_count
> proxy.process.cache.aio.write_count
61,62d64
< proxy.process.cache.KB_read_per_sec
< proxy.process.cache.KB_write_per_sec
77d78
< proxy.process.cache.read_per_sec
384d384
< proxy.process.cache.write_per_sec
386c386
< proxy.process.dns.fail_avg_time
---
> proxy.process.dns.fail_time
388d387
< proxy.process.dns.lookup_avg_time
390a390
> proxy.process.dns.lookup_time
393c393
< proxy.process.dns.success_avg_time
---
> proxy.process.dns.success_time
540,541d539
< proxy.process.http.avg_transactions_per_client_connection
< proxy.process.http.avg_transactions_per_server_connection
635,641d632
< proxy.process.http.origin_server_speed_bytes_per_sec_100
< proxy.process.http.origin_server_speed_bytes_per_sec_100K
< proxy.process.http.origin_server_speed_bytes_per_sec_100M
< proxy.process.http.origin_server_speed_bytes_per_sec_10K
< proxy.process.http.origin_server_speed_bytes_per_sec_10M
< proxy.process.http.origin_server_speed_bytes_per_sec_1K
< proxy.process.http.origin_server_speed_bytes_per_sec_1M
673,686d663
< proxy.process.http.request_document_size_100
< proxy.process.http.request_document_size_10K
< proxy.process.http.request_document_size_1K
< proxy.process.http.request_document_size_1M
< proxy.process.http.request_document_size_3K
< proxy.process.http.request_document_size_5K
< proxy.process.http.request_document_size_inf
< proxy.process.http.response_document_size_100
< proxy.process.http.response_document_size_10K
< proxy.process.http.response_document_size_1K
< proxy.process.http.response_document_size_1M
< proxy.process.http.response_document_size_3K
< proxy.process.http.response_document_size_5K
< proxy.process.http.response_document_size_inf
757,763d733
< proxy.process.http.user_agent_speed_bytes_per_sec_100
< proxy.process.http.user_agent_speed_bytes_per_sec_100K
< proxy.process.http.user_agent_speed_bytes_per_sec_100M
< proxy.process.http.user_agent_speed_bytes_per_sec_10K
< proxy.process.http.user_agent_speed_bytes_per_sec_10M
< proxy.process.http.user_agent_speed_bytes_per_sec_1K
< proxy.process.http.user_agent_speed_bytes_per_sec_1M
809a780
> proxy.process.net.read_bytes_count
810a782
> proxy.process.net.write_bytes_count

cmcfarlen pushed a commit to cmcfarlen/trafficserver that referenced this pull request Jun 3, 2024
* asf/master:
  This drops the _stat suffix from some metrics (apache#10441)
  Fix CID-1518256 (apache#10403)
  Restore original metrics names, these were typos (apache#10440)
  Use the new Metrics module for core metrics (apache#10175)
  Fix use-after-free issue (apache#10399)
  Fixed differences between cmake rc files and autotools (apache#10408)
  Fix hwloc build (apache#10406)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants