Emit block cache metrics#4518
Conversation
|
This is a general observation - prompted by this PR, but is more general. What it the purpose of the html javadocs table? From the description it is only for mapping old metrics names to new, but there are entries in there now that were not present in the old metrics system. So, it seems to have also become a place to document new metrics. If the table is only for mapping changes, then this PR is correct to not add the new metrics to the table. If the table has morphed into a way to document the available metrics so that they show up in documentation for user reference., then the new metric names and descriptions should be added to the table. Also, the general description of the purpose of the table should be changed to state that it reflects all metrics and includes the mapping from old to new where appropriate. I do not know how we wish to document the metrics - the table is one way, but it is tedious for developers to keep in sync. The docs generated by the table do seem useful, I just don't know if this is the best way, or if we could find something that is easier to maintain. It does seem that we should provide documentation for the metric names. We should also determine how "public" as in public API are the metric names. Similar to Properties, they are internal, so subject to change. But, they are also used externally and changes could cause issues with scripts, or in the case of metrics tracking and post-processing. Currently I have been changing the metric names to be more consistent and hopefully follow a standard convention. Being that the 2.1 metrics are new and have not received a lot of scrutiny, changing non-public name seems like it would have minor impact now, but as the metrics are adopted more widely, changes, additions or deletions should at least be communicated - somehow. |
|
Could move the cache type higher in the name. This brings the related data together when sorting names or using completion. |
|
In addition to the tablet sever, also need to register these new metrics in the scan server. When registering in the scan server will need to pick up the resource group tag, which may happen automatically. |
I'm not sure I follow. Are you suggesting to add a |
|
PR #4461 adds the |
# Conflicts: # server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java
|
|
||
| @Override | ||
| public void registerMetrics(MeterRegistry registry) { | ||
| Gauge.builder("indexCacheHitCount", indexCache, cache -> cache.getStats().hitCount()) |
There was a problem hiding this comment.
If these are monotonically increasing counters, then it would be better to use a FunctionCounter instead of a Gauge. Not sure if they are, need to investigate. The micrometer code to instrument caches uses FunctionCounter internally
There was a problem hiding this comment.
Looked at the cache impls in Accumulo, these counters are monotonically increasing counters for those. We should probably add that to the SPI javadoc.
There was a problem hiding this comment.
Would those also be candidates to be FunctionCounters? (maybe outside of this PR)
|
After changing to the |
|
I confirmed that the MetricsIT passes with the current changes. |
|
Wondering if the cache type should be a tag as opposed to being part of the metric name, but do not have a good sense of the pros and cons of that ATM. |
|
@EdColeman I saw your question, I think there are some other metrics in Accumulo that are monotonically increasing that may be better suited as a FunctionCounter. Not sure if those should be changed in 2.1 though. For example |
@keith-turner I can't find the |
The HTML table was the simplest way that I could think of to document the metric name changes as we moved from Hadoop Metrics to Micrometer. I do agree that we need a good way of documenting them, especially since some of them may be used not just for monitoring, but resource control, in future versions. |
These were changed in #4461. From |
Fixes #4492
Adds cache hit and cache request count metrics for data, index and summary caches.
Here is an example when metrics are set up to be logged:
I am not sure if I named these that well so feedback on that would be helpful.