Skip to content

Conversation

@sincejune
Copy link
Contributor

Description

This PR introduced query-level data collection for mysql receiver in the logs pipeline.

We introduced Top Query collection in this initial PR (Top Queries are those queries which used the most time within a time window)

Configuration

We introduced these four configurations for the feature(see receiver's README for details):

  1. max_sample_query_count: the initial query count to fetch from database.
  2. top_query_count: the number to report to the next consumer.
  3. lookback_time: the query window for each scrape.
  4. collection_interval: the query interval for top collection collection.
  5. query_plan_cache_size: the query plan cache size.
  6. query_plan_cache_ttl: the query plan cache ttl time.
Workflow

The mysql receiver will fetch M(=max_sample_query_count) queries from database and sort the queries according to the difference of sum_timer_wait(time used), and then report the first N(=top_query_count) queries.

New Log Attributes
  • db.system.name
  • db.query.text
  • mysql.query_plan
  • mysql.events_statements_summary_by_digest.digest
  • mysql.events_statements_summary_by_digest.count_star
  • mysql.events_statements_summary_by_digest.sum_timer_wait
Additional dependency
Example Output
resourceLogs:
  - resource: {}
    scopeLogs:
      - logRecords:
          - attributes:
              - key: db.system.name
                value:
                  stringValue: mysql
              - key: db.query.text
                value:
                  stringValue: SELECT COALESCE ( schema_name, ? ) COALESCE ( digest, ? ) COALESCE ( digest_text, ? ) count_star, sum_timer_wait, query_sample_text FROM performance_schema.events_statements_summary_by_digest WHERE last_seen >= NOW ( ) - INTERVAL ? second AND ( ( digest_text NOT LIKE ? AND digest_text NOT LIKE ? ) OR digest_text IS ? ) ORDER BY count_star DESC LIMIT ?
              - key: mysql.query_plan
                value:
                  stringValue: |
                    {"query_block":{"select_id":1,"cost_info":{"query_cost":"?"},"ordering_operation":{"using_filesort":true,"table":{"table_name":"events_statements_summary_by_digest","access_type":"ALL","rows_examined_per_scan":"?","rows_produced_per_join":"?","filtered":"?","cost_info":{"read_cost":"?","eval_cost":"?","prefix_cost":"?","data_read_per_join":"?"},"used_columns":["SCHEMA_NAME","DIGEST","DIGEST_TEXT","COUNT_STAR","SUM_TIMER_WAIT","LAST_SEEN","QUERY_SAMPLE_TEXT"],"attached_condition":"( ( performance_schema . events_statements_summary_by_digest . LAST_SEEN >= < cache > ( ( now ( ) - interval ? second ) ) ) and ( ( ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) and ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) ) or ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT is ? ) ) )"}}}}
              - key: mysql.events_statements_summary_by_digest.digest
                value:
                  stringValue: c16f24f908846019a741db580f6545a5933e9435a7cf1579c50794a6ca287739
              - key: mysql.events_statements_summary_by_digest.count_star
                value:
                  intValue: "5"
              - key: mysql.events_statements_summary_by_digest.sum_timer_wait
                value:
                  doubleValue: 0.001021918999
            body: {}
            eventName: db.server.top_query
            timeUnixNano: "1754297675177556000"
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/mysqlreceiver
          version: latest

Link to tracking issue

n/a

Testing

Added

Documentation

Updated

@sincejune sincejune force-pushed the mysql-top-query-collection branch from 598b6f4 to e660f95 Compare August 20, 2025 06:10
@sincejune sincejune force-pushed the mysql-top-query-collection branch from 0e62d2d to 2495878 Compare August 20, 2025 06:45
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 4, 2025
@atoulme atoulme removed the Stale label Sep 5, 2025
@atoulme
Copy link
Contributor

atoulme commented Sep 5, 2025

@antonblock @ishleenk17 please review as codeowners

Copy link
Contributor

@antonblock antonblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Able to test this successfully, just had a question about the explainQuery method

Copy link
Member

@crobert-1 crobert-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments, sorry about that. My overarching thought is that as much as possible we should try to move shared top query functionality to an internal package that can be reused across receivers. There's a lot of duplicated code which really adds to the maintenance burden.

@sincejune sincejune requested a review from crobert-1 September 27, 2025 09:18
@atoulme
Copy link
Contributor

atoulme commented Sep 30, 2025

@ishleenk17 @antonblock please review as codeowners.

@antonblock
Copy link
Contributor

Other than the failing go mod check, this looks good

@atoulme atoulme merged commit 026bfd5 into open-telemetry:main Oct 10, 2025
186 checks passed
@github-actions github-actions bot added this to the next release milestone Oct 10, 2025
tommyers-elastic pushed a commit to tommyers-elastic/opentelemetry-collector-contrib that referenced this pull request Oct 10, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This PR introduced query-level data collection for `mysql` receiver in
the logs pipeline.

We introduced `Top Query` collection in this initial PR (`Top Queries`
are those queries which used the most time within a time window)

##### Configuration
We introduced these four configurations for the feature(see receiver's
README for details):
1. `max_sample_query_count`: the initial query count to fetch from
database.
2. `top_query_count`: the number to report to the next consumer.
3. `lookback_time`: the query window for each scrape.
4. `collection_interval`: the query interval for top collection
collection.
5. `query_plan_cache_size`: the query plan cache size.
6. `query_plan_cache_ttl`: the query plan cache ttl time.

##### Workflow
The `mysql` receiver will fetch M(=`max_sample_query_count`) queries
from database and sort the queries according to the difference of
`sum_timer_wait`(time used), and then report the first
N(=`top_query_count`) queries.

##### New Log Attributes
- `db.system.name`
- `db.query.text`
- `mysql.query_plan`
- `mysql.events_statements_summary_by_digest.digest`
- `mysql.events_statements_summary_by_digest.count_star`
- `mysql.events_statements_summary_by_digest.sum_timer_wait`

##### Additional dependency
* `hashicorp/golang-lru/v2`
  * License: MPL-2.0
  * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2
  * Already been used in the repo
* `DataDog/datadog-agent/pkg/obfuscate`
  * License: Apache 2.0
* Link:
https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate
  * Already been used in the repo

##### Example Output
```
resourceLogs:
  - resource: {}
    scopeLogs:
      - logRecords:
          - attributes:
              - key: db.system.name
                value:
                  stringValue: mysql
              - key: db.query.text
                value:
                  stringValue: SELECT COALESCE ( schema_name, ? ) COALESCE ( digest, ? ) COALESCE ( digest_text, ? ) count_star, sum_timer_wait, query_sample_text FROM performance_schema.events_statements_summary_by_digest WHERE last_seen >= NOW ( ) - INTERVAL ? second AND ( ( digest_text NOT LIKE ? AND digest_text NOT LIKE ? ) OR digest_text IS ? ) ORDER BY count_star DESC LIMIT ?
              - key: mysql.query_plan
                value:
                  stringValue: |
                    {"query_block":{"select_id":1,"cost_info":{"query_cost":"?"},"ordering_operation":{"using_filesort":true,"table":{"table_name":"events_statements_summary_by_digest","access_type":"ALL","rows_examined_per_scan":"?","rows_produced_per_join":"?","filtered":"?","cost_info":{"read_cost":"?","eval_cost":"?","prefix_cost":"?","data_read_per_join":"?"},"used_columns":["SCHEMA_NAME","DIGEST","DIGEST_TEXT","COUNT_STAR","SUM_TIMER_WAIT","LAST_SEEN","QUERY_SAMPLE_TEXT"],"attached_condition":"( ( performance_schema . events_statements_summary_by_digest . LAST_SEEN >= < cache > ( ( now ( ) - interval ? second ) ) ) and ( ( ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) and ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) ) or ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT is ? ) ) )"}}}}
              - key: mysql.events_statements_summary_by_digest.digest
                value:
                  stringValue: c16f24f908846019a741db580f6545a5933e9435a7cf1579c50794a6ca287739
              - key: mysql.events_statements_summary_by_digest.count_star
                value:
                  intValue: "5"
              - key: mysql.events_statements_summary_by_digest.sum_timer_wait
                value:
                  doubleValue: 0.001021918999
            body: {}
            eventName: db.server.top_query
            timeUnixNano: "1754297675177556000"
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/mysqlreceiver
          version: latest

```
<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
n/a

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added

<!--Describe the documentation added.-->
#### Documentation
Updated

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Nico Stewart <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
ChrsMark pushed a commit to ChrsMark/opentelemetry-collector-contrib that referenced this pull request Oct 20, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This PR introduced query-level data collection for `mysql` receiver in
the logs pipeline.

We introduced `Top Query` collection in this initial PR (`Top Queries`
are those queries which used the most time within a time window)

##### Configuration
We introduced these four configurations for the feature(see receiver's
README for details):
1. `max_sample_query_count`: the initial query count to fetch from
database.
2. `top_query_count`: the number to report to the next consumer.
3. `lookback_time`: the query window for each scrape.
4. `collection_interval`: the query interval for top collection
collection.
5. `query_plan_cache_size`: the query plan cache size.
6. `query_plan_cache_ttl`: the query plan cache ttl time.

##### Workflow
The `mysql` receiver will fetch M(=`max_sample_query_count`) queries
from database and sort the queries according to the difference of
`sum_timer_wait`(time used), and then report the first
N(=`top_query_count`) queries.

##### New Log Attributes
- `db.system.name`
- `db.query.text`
- `mysql.query_plan`
- `mysql.events_statements_summary_by_digest.digest`
- `mysql.events_statements_summary_by_digest.count_star`
- `mysql.events_statements_summary_by_digest.sum_timer_wait`

##### Additional dependency
* `hashicorp/golang-lru/v2`
  * License: MPL-2.0
  * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2
  * Already been used in the repo
* `DataDog/datadog-agent/pkg/obfuscate`
  * License: Apache 2.0
* Link:
https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate
  * Already been used in the repo

##### Example Output
```
resourceLogs:
  - resource: {}
    scopeLogs:
      - logRecords:
          - attributes:
              - key: db.system.name
                value:
                  stringValue: mysql
              - key: db.query.text
                value:
                  stringValue: SELECT COALESCE ( schema_name, ? ) COALESCE ( digest, ? ) COALESCE ( digest_text, ? ) count_star, sum_timer_wait, query_sample_text FROM performance_schema.events_statements_summary_by_digest WHERE last_seen >= NOW ( ) - INTERVAL ? second AND ( ( digest_text NOT LIKE ? AND digest_text NOT LIKE ? ) OR digest_text IS ? ) ORDER BY count_star DESC LIMIT ?
              - key: mysql.query_plan
                value:
                  stringValue: |
                    {"query_block":{"select_id":1,"cost_info":{"query_cost":"?"},"ordering_operation":{"using_filesort":true,"table":{"table_name":"events_statements_summary_by_digest","access_type":"ALL","rows_examined_per_scan":"?","rows_produced_per_join":"?","filtered":"?","cost_info":{"read_cost":"?","eval_cost":"?","prefix_cost":"?","data_read_per_join":"?"},"used_columns":["SCHEMA_NAME","DIGEST","DIGEST_TEXT","COUNT_STAR","SUM_TIMER_WAIT","LAST_SEEN","QUERY_SAMPLE_TEXT"],"attached_condition":"( ( performance_schema . events_statements_summary_by_digest . LAST_SEEN >= < cache > ( ( now ( ) - interval ? second ) ) ) and ( ( ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) and ( not ( ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT like ? ) ) ) ) or ( performance_schema . events_statements_summary_by_digest . DIGEST_TEXT is ? ) ) )"}}}}
              - key: mysql.events_statements_summary_by_digest.digest
                value:
                  stringValue: c16f24f908846019a741db580f6545a5933e9435a7cf1579c50794a6ca287739
              - key: mysql.events_statements_summary_by_digest.count_star
                value:
                  intValue: "5"
              - key: mysql.events_statements_summary_by_digest.sum_timer_wait
                value:
                  doubleValue: 0.001021918999
            body: {}
            eventName: db.server.top_query
            timeUnixNano: "1754297675177556000"
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/mysqlreceiver
          version: latest

```
<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
n/a

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added

<!--Describe the documentation added.-->
#### Documentation
Updated

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Nico Stewart <[email protected]>
Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants