-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[receiver/mysql] Support top query collection #41847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/mysql] Support top query collection #41847
Conversation
598b6f4 to
e660f95
Compare
0e62d2d to
2495878
Compare
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
@antonblock @ishleenk17 please review as codeowners |
antonblock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Able to test this successfully, just had a question about the explainQuery method
crobert-1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments, sorry about that. My overarching thought is that as much as possible we should try to move shared top query functionality to an internal package that can be reused across receivers. There's a lot of duplicated code which really adds to the maintenance burden.
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
|
@ishleenk17 @antonblock please review as codeowners. |
|
Other than the failing go mod check, this looks good |
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduced query-level data collection for `mysql` receiver in the logs pipeline. We introduced `Top Query` collection in this initial PR (`Top Queries` are those queries which used the most time within a time window) ##### Configuration We introduced these four configurations for the feature(see receiver's README for details): 1. `max_sample_query_count`: the initial query count to fetch from database. 2. `top_query_count`: the number to report to the next consumer. 3. `lookback_time`: the query window for each scrape. 4. `collection_interval`: the query interval for top collection collection. 5. `query_plan_cache_size`: the query plan cache size. 6. `query_plan_cache_ttl`: the query plan cache ttl time. ##### Workflow The `mysql` receiver will fetch M(=`max_sample_query_count`) queries from database and sort the queries according to the difference of `sum_timer_wait`(time used), and then report the first N(=`top_query_count`) queries. ##### New Log Attributes - `db.system.name` - `db.query.text` - `mysql.query_plan` - `mysql.events_statements_summary_by_digest.digest` - `mysql.events_statements_summary_by_digest.count_star` - `mysql.events_statements_summary_by_digest.sum_timer_wait` ##### Additional dependency * `hashicorp/golang-lru/v2` * License: MPL-2.0 * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 * Already been used in the repo * `DataDog/datadog-agent/pkg/obfuscate` * License: Apache 2.0 * Link: https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate * Already been used in the repo ##### Example Output ``` resourceLogs: - resource: {} scopeLogs: - logRecords: - attributes: - key: db.system.name value: stringValue: mysql - key: db.query.text value: stringValue: SELECT COALESCE ( schema_name, ? ) COALESCE ( digest, ? ) COALESCE ( digest_text, ? ) count_star, sum_timer_wait, query_sample_text FROM performance_schema.events_statements_summary_by_digest WHERE last_seen >= NOW ( ) - INTERVAL ? second AND ( ( digest_text NOT LIKE ? AND digest_text NOT LIKE ? ) OR digest_text IS ? ) ORDER BY count_star DESC LIMIT ? - key: mysql.query_plan value: stringValue: | {"query_block":{"select_id":1,"cost_info":{"query_cost":"?"},"ordering_operation":{"using_filesort":true,"table":{"table_name":"events_statements_summary_by_digest","access_type":"ALL","rows_examined_per_scan":"?","rows_produced_per_join":"?","filtered":"?","cost_info":{"read_cost":"?","eval_cost":"?","prefix_cost":"?","data_read_per_join":"?"},"used_columns":["SCHEMA_NAME","DIGEST","DIGEST_TEXT","COUNT_STAR","SUM_TIMER_WAIT","LAST_SEEN","QUERY_SAMPLE_TEXT"],"attached_condition":"( ( performance_schema . events_statements_summary_by_digest . LAST_SEEN >= < cache > ( ( now ( ) - interval ? second ) ) ) and ( ( ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) and ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) ) or ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT is ? ) ) )"}}}} - key: mysql.events_statements_summary_by_digest.digest value: stringValue: c16f24f908846019a741db580f6545a5933e9435a7cf1579c50794a6ca287739 - key: mysql.events_statements_summary_by_digest.count_star value: intValue: "5" - key: mysql.events_statements_summary_by_digest.sum_timer_wait value: doubleValue: 0.001021918999 body: {} eventName: db.server.top_query timeUnixNano: "1754297675177556000" scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/mysqlreceiver version: latest ``` <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue n/a <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Updated <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Nico Stewart <[email protected]> Co-authored-by: Curtis Robert <[email protected]> Co-authored-by: Antoine Toulme <[email protected]> Co-authored-by: Antoine Toulme <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduced query-level data collection for `mysql` receiver in the logs pipeline. We introduced `Top Query` collection in this initial PR (`Top Queries` are those queries which used the most time within a time window) ##### Configuration We introduced these four configurations for the feature(see receiver's README for details): 1. `max_sample_query_count`: the initial query count to fetch from database. 2. `top_query_count`: the number to report to the next consumer. 3. `lookback_time`: the query window for each scrape. 4. `collection_interval`: the query interval for top collection collection. 5. `query_plan_cache_size`: the query plan cache size. 6. `query_plan_cache_ttl`: the query plan cache ttl time. ##### Workflow The `mysql` receiver will fetch M(=`max_sample_query_count`) queries from database and sort the queries according to the difference of `sum_timer_wait`(time used), and then report the first N(=`top_query_count`) queries. ##### New Log Attributes - `db.system.name` - `db.query.text` - `mysql.query_plan` - `mysql.events_statements_summary_by_digest.digest` - `mysql.events_statements_summary_by_digest.count_star` - `mysql.events_statements_summary_by_digest.sum_timer_wait` ##### Additional dependency * `hashicorp/golang-lru/v2` * License: MPL-2.0 * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 * Already been used in the repo * `DataDog/datadog-agent/pkg/obfuscate` * License: Apache 2.0 * Link: https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate * Already been used in the repo ##### Example Output ``` resourceLogs: - resource: {} scopeLogs: - logRecords: - attributes: - key: db.system.name value: stringValue: mysql - key: db.query.text value: stringValue: SELECT COALESCE ( schema_name, ? ) COALESCE ( digest, ? ) COALESCE ( digest_text, ? ) count_star, sum_timer_wait, query_sample_text FROM performance_schema.events_statements_summary_by_digest WHERE last_seen >= NOW ( ) - INTERVAL ? second AND ( ( digest_text NOT LIKE ? AND digest_text NOT LIKE ? ) OR digest_text IS ? ) ORDER BY count_star DESC LIMIT ? - key: mysql.query_plan value: stringValue: | {"query_block":{"select_id":1,"cost_info":{"query_cost":"?"},"ordering_operation":{"using_filesort":true,"table":{"table_name":"events_statements_summary_by_digest","access_type":"ALL","rows_examined_per_scan":"?","rows_produced_per_join":"?","filtered":"?","cost_info":{"read_cost":"?","eval_cost":"?","prefix_cost":"?","data_read_per_join":"?"},"used_columns":["SCHEMA_NAME","DIGEST","DIGEST_TEXT","COUNT_STAR","SUM_TIMER_WAIT","LAST_SEEN","QUERY_SAMPLE_TEXT"],"attached_condition":"( ( performance_schema . events_statements_summary_by_digest . LAST_SEEN >= < cache > ( ( now ( ) - interval ? second ) ) ) and ( ( ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) and ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) ) or ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT is ? ) ) )"}}}} - key: mysql.events_statements_summary_by_digest.digest value: stringValue: c16f24f908846019a741db580f6545a5933e9435a7cf1579c50794a6ca287739 - key: mysql.events_statements_summary_by_digest.count_star value: intValue: "5" - key: mysql.events_statements_summary_by_digest.sum_timer_wait value: doubleValue: 0.001021918999 body: {} eventName: db.server.top_query timeUnixNano: "1754297675177556000" scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/mysqlreceiver version: latest ``` <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue n/a <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Updated <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Nico Stewart <[email protected]> Co-authored-by: Curtis Robert <[email protected]> Co-authored-by: Antoine Toulme <[email protected]> Co-authored-by: Antoine Toulme <[email protected]>
Description
This PR introduced query-level data collection for
mysqlreceiver in the logs pipeline.We introduced
Top Querycollection in this initial PR (Top Queriesare those queries which used the most time within a time window)Configuration
We introduced these four configurations for the feature(see receiver's README for details):
max_sample_query_count: the initial query count to fetch from database.top_query_count: the number to report to the next consumer.lookback_time: the query window for each scrape.collection_interval: the query interval for top collection collection.query_plan_cache_size: the query plan cache size.query_plan_cache_ttl: the query plan cache ttl time.Workflow
The
mysqlreceiver will fetch M(=max_sample_query_count) queries from database and sort the queries according to the difference ofsum_timer_wait(time used), and then report the first N(=top_query_count) queries.New Log Attributes
db.system.namedb.query.textmysql.query_planmysql.events_statements_summary_by_digest.digestmysql.events_statements_summary_by_digest.count_starmysql.events_statements_summary_by_digest.sum_timer_waitAdditional dependency
hashicorp/golang-lru/v2DataDog/datadog-agent/pkg/obfuscateExample Output
Link to tracking issue
n/a
Testing
Added
Documentation
Updated