Skip to content

Add Metrics for CompactionJobPriority Queues#3551

Merged
keith-turner merged 18 commits into
apache:elasticityfrom
ddanielr:3474-add-metrics
Jul 20, 2023
Merged

Add Metrics for CompactionJobPriority Queues#3551
keith-turner merged 18 commits into
apache:elasticityfrom
ddanielr:3474-add-metrics

Conversation

@ddanielr
Copy link
Copy Markdown
Contributor

@ddanielr ddanielr commented Jun 28, 2023

Adds the following metrics for cjpq to solve #3474:

  • Number of Queues
  • Queue Size
  • Number of Queued Jobs
  • Number of Dequeued Jobs
  • Number of Rejected Jobs

This is using the Thread polling method of updating metric values.

Looking for feedback on metric naming & prefixes, and implementation.

Adds the following metrics for cjpq:

* Number of Queues
* Queue Size
* Number of Queued Jobs
* Number of Dequeued Jobs
* Number of Rejected Jobs
Comment thread core/src/main/java/org/apache/accumulo/core/metrics/MetricsUtil.java Outdated
Removed bad line of code that was left over from a poor merge conflict
resolution
@EdColeman
Copy link
Copy Markdown
Contributor

In general - why are the metric changes being testing in BulkNewIT? Would it be better if the test was a stand-alone test of the metrics? If a package does not exist, in may be desirable to have a metrics specific sub-package for metric IT tests so that they are easier to find?

ddanielr added 4 commits July 7, 2023 21:30
Applies feedback from PR review.

* Changes metric registration to register via queue name, rather than by
queue object.
* * This helps avoid metric issues when the queues are created & deleted.

* Removes unnessessary Atomic var for tracking queued Jobs.
* Fixes naming scheme issues.
Switch from size test to looking up all metrics via queue ID.
@ddanielr
Copy link
Copy Markdown
Contributor Author

ddanielr commented Jul 7, 2023

In general - why are the metric changes being testing in BulkNewIT? Would it be better if the test was a stand-alone test of the metrics? If a package does not exist, in may be desirable to have a metrics specific sub-package for metric IT tests so that they are easier to find?

I was testing this in BulkNewIT as I knew that the compaction operations would trigger and metrics would be generated.
This was done as a convenience to get these metrics generated quickly.

However, I need to add more deliberate tests for these metrics to ensure the gauges are being set to correct values.
Right now it's just collecting metrics with the specified prefix and putting them into the log.

I'm not sure about how to best organize the metric tests so I'd be happy to discuss the best options there.

ddanielr added 3 commits July 10, 2023 20:08
Added formatting steps to formatString and included test class with
string examples.
* Switched CompactionJobQueues to a Final
* Cleaned up anonymous functions in the update method.
Comment thread core/src/main/java/org/apache/accumulo/core/metrics/MetricsUtil.java Outdated
Comment thread test/src/main/java/org/apache/accumulo/test/functional/BulkNewIT.java Outdated
ddanielr added 3 commits July 18, 2023 08:01
* Fixes the queue count metrics
* Switches metric names to dot notation
* Removes test changes from BulkNewIT
* Adds CompactionPriorityQueueMetricsIT
* Adds User defined CompactionResourceGroup override to MAC
* Adds new Lowest Job Priority Metric
@ddanielr
Copy link
Copy Markdown
Contributor Author

Currently the new test added in (CompactionPriorityQueueMetricsIT.java) fails when run.
I believe the check condition is correct and that we may have some unintended behaviors happening when multiple compaction jobs are being submitted to the same queue.

Also should all references to queue be swapped for compaction group? The jobs still do function in a queue structure.

ddanielr added 2 commits July 19, 2023 05:47
* Refactors IT test to create additional tablets and properly generate
compaction jobs without having selected file collisions.

* Adds configurable property to change CompactionQueue size and test
rejectedJobs. See apache#3635 for more details.

* Adds comparision checks in the IT to check metric values against known datapoints.
@ddanielr ddanielr marked this pull request as ready for review July 19, 2023 06:23
@ddanielr
Copy link
Copy Markdown
Contributor Author

Currently the new test added in (CompactionPriorityQueueMetricsIT.java) fails when run. I believe the check condition is correct and that we may have some unintended behaviors happening when multiple compaction jobs are being submitted to the same queue.

Also should all references to queue be swapped for compaction group? The jobs still do function in a queue structure.

Discussed this with @keith-turner and discovered the issue was with my setup of the tablets and split points.
Test was corrected to have more tablets than queue slots and it is now passing.

Additional checks were added to validate specific metrics values.

#3635 was also created to eventually replace a property that was added for setting a PriorityQueue's maxSize.

Comment thread core/src/main/java/org/apache/accumulo/core/metrics/MetricsProducer.java Outdated
Comment thread core/src/main/java/org/apache/accumulo/core/conf/Property.java Outdated
* Removes unnecessary comments
* Switches metric names to use `queue` vs `cjpq`
* Adds final modifier to queueSize var

boolean sawQueues = false;
// An empty queue means that the last known value is the most recent.
while (!queueMetrics.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also check the dequeued count in this part of the test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing that, but the problem with dequeued is that once all the jobs are processed the queue is removed so dequeued returns 0.
Due to the 3 second polling rate, it's hard to identify that condition quickly enough in the test.

If we wanted to check dequeued jobs then that should be a separate test with a large queue size and a slow compaction (+1 second runtime per compaction)

}

@Test
public void testQueueMetrics() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to test metrics from multiple queues, just to ensure the data for queue is independent. Could be a follow in issue.

@keith-turner
Copy link
Copy Markdown
Contributor

@ddanielr are you ready to merge this?

@ddanielr
Copy link
Copy Markdown
Contributor Author

Yes. I believe it's in a good state.

@keith-turner
Copy link
Copy Markdown
Contributor

ok, I will merge if the build passes

@keith-turner keith-turner merged commit 6e745d1 into apache:elasticity Jul 20, 2023
@ddanielr ddanielr deleted the 3474-add-metrics branch July 25, 2023 17:29
@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants