Skip to content

fix: handle logtail compaction during dirty scan#23664

Merged
mergify[bot] merged 4 commits intomatrixorigin:mainfrom
aptend:fix-dirty
Feb 4, 2026
Merged

fix: handle logtail compaction during dirty scan#23664
mergify[bot] merged 4 commits intomatrixorigin:mainfrom
aptend:fix-dirty

Conversation

@aptend
Copy link
Copy Markdown
Contributor

@aptend aptend commented Feb 3, 2026

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23631

What this PR does / why we need it:

Avoid missing dirty tables when logtail blocks are compacted during
dirty scan. The reader now falls back to block summaries and waits
for compaction summary to be published before merging.


PR Type

Bug fix


Description

  • Handle logtail block compaction during dirty table scan

  • Add fallback to block summaries when memo becomes nil

  • Wait for compaction summary publication before merging

  • Simplify IsDirtyOnTable by reusing GetDirty logic


Diagram Walkthrough

flowchart LR
  A["GetDirty scan"] --> B{"Block memo nil?"}
  B -->|No| C["Merge memo dirty data"]
  B -->|Yes| D["Set compact flag"]
  D --> E["Wait for summary"]
  E --> F["Merge summary data"]
  F --> G["Return dirty tree"]
  C --> G
Loading

File Walkthrough

Relevant files
Bug fix
reader.go
Handle block compaction during dirty scan                               

pkg/vm/engine/tae/logtail/reader.go

  • Added runtime import for Gosched() call
  • Introduced mergeSummary() helper function to consolidate table merging
    logic
  • Enhanced GetDirty() to detect and handle block compaction during
    iteration
  • Added postBlkOp callback to wait for compaction summary and merge data
  • Simplified IsDirtyOnTable() to reuse GetDirty() instead of custom
    logic
  • Updated GetMaxLSN() call signature to include new postBlkOp parameter
+41/-27 
Enhancement
table.go
Add post-block operation callback                                               

pkg/vm/engine/tae/logtail/table.go

  • Added postBlkOp parameter to ForeachRowInBetween() function signature
  • Added callback invocation after processing each block to handle
    post-block operations
+5/-0     

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Feb 3, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unbounded busy-wait

Description: The new postBlkOp introduces an unbounded busy-wait loop (for summary == nil {
runtime.Gosched(); ... }) that can spin indefinitely if blk.summary is never published,
enabling a potential denial-of-service via goroutine starvation/CPU consumption and stuck
dirty scans.
reader.go [58-71]

Referred Code
postBlkOp := func(blk BlockT) {
	if !iterateOnCompact {
		return
	}
	summary := blk.summary.Load()
	// wait TryCompact to finish
	for summary == nil {
		runtime.Gosched()
		summary = blk.summary.Load()
	}
	mergeSummary(tree, summary)
	count += len(blk.rows)
	iterateOnCompact = false
}
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Infinite spin wait: The new postBlkOp busy-waits in a for summary == nil loop with runtime.Gosched() and no
timeout/cancel path, which can hang indefinitely if the compaction summary is never
published.

Referred Code
postBlkOp := func(blk BlockT) {
	if !iterateOnCompact {
		return
	}
	summary := blk.summary.Load()
	// wait TryCompact to finish
	for summary == nil {
		runtime.Gosched()
		summary = blk.summary.Load()
	}
	mergeSummary(tree, summary)
	count += len(blk.rows)
	iterateOnCompact = false
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@mergify mergify Bot added the kind/bug Something isn't working label Feb 3, 2026
@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Feb 3, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix function call with incorrect arguments

Correct the function call in IsDirtyOnTable from r.GetDirty(false) to
r.GetDirty() to match the updated function signature and prevent a compilation
error.

pkg/vm/engine/tae/logtail/reader.go [86-90]

 func (r *Reader) IsDirtyOnTable(DbID, id uint64) bool {
 	// TODO: optimize if needed
-	tree, _ := r.GetDirty(false)
+	tree, _ := r.GetDirty()
 	return tree.HasTable(id)
 }
  • Apply / Chat
Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies a call to r.GetDirty(false) with an argument, while the function signature was changed in this PR to take no arguments, which would cause a compilation error.

High
Avoid busy-waiting and fix race condition

In the postBlkOp function, remove the busy-wait loop that uses runtime.Gosched()
to avoid high CPU usage while waiting for compaction to finish.

pkg/vm/engine/tae/logtail/reader.go [58-71]

 	postBlkOp := func(blk BlockT) {
 		if !iterateOnCompact {
 			return
 		}
 		summary := blk.summary.Load()
-		// wait TryCompact to finish
-		for summary == nil {
-			runtime.Gosched()
-			summary = blk.summary.Load()
+		if summary == nil {
+			// TryCompact is not finished yet, wait for next block's postBlkOp to check again.
+			// runtime.Gosched() is an option here for proactive waiting, but it's better to avoid busy-waiting.
+			return
 		}
 		mergeSummary(tree, summary)
 		count += len(blk.rows)
 		iterateOnCompact = false
 	}
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a busy-wait loop using runtime.Gosched(), which is a valid performance concern, and proposes removing it.

Medium
General
Fix inconsistent row counting logic
Suggestion Impact:The commit moved count++ out of the memo.HasAnyTableDataChanges() conditional so it runs for every row, matching the suggested consistent counting behavior. It also removed a separate count += len(blk.rows) increment in the compacted-block path, further adjusting row counting logic.

code diff:

@@ -50,8 +50,8 @@
 		} else if memo.HasAnyTableDataChanges() {
 			row.GetTxnState(true)
 			tree.Merge(memo.GetDirty())
-			count++
 		}
+		count++
 		return true
 	}
 
@@ -66,7 +66,6 @@
 			summary = blk.summary.Load()
 		}
 		mergeSummary(tree, summary)
-		count += len(blk.rows)
 		iterateOnCompact = false
 	}

In the op function, move the count++ increment outside the conditional block to
ensure it is executed for every row, maintaining consistent counting logic.

pkg/vm/engine/tae/logtail/reader.go [44-56]

 	op := func(row RowT) (moveOn bool) {
 		memo := row.GetMemo()
 		if memo == nil {
 			// current block is compacted during iterating
 			// retrieve the memo from blk summary, only once.
 			iterateOnCompact = true
 		} else if memo.HasAnyTableDataChanges() {
 			row.GetTxnState(true)
 			tree.Merge(memo.GetDirty())
-			count++
 		}
+		count++
 		return true
 	}

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the PR changed the logic for incrementing count, making it conditional, which contradicts the logic in other parts of the function and the original implementation.

Medium
  • Update

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Feb 4, 2026

Merge Queue Status

✅ The pull request has been merged at 1c65cdf

This pull request spent 54 minutes 22 seconds in the queue, including 54 minutes 10 seconds running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working Review effort 3/5 size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants