Topic Writer Backpressure by vgvoleg · Pull Request #797 · ydb-platform/ydb-python-sdk

vgvoleg · 2026-03-31T15:53:24Z

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Other information

Copilot

Pull request overview

This PR introduces configurable backpressure for the Topic Writer by adding a byte-size limit for the internal in-flight buffer and making writes block (up to a timeout) until space is freed by server acks.

Changes:

Add max_buffer_size_bytes and buffer_wait_timeout_sec to writer settings and enforce them in WriterAsyncIOReconnector.write_with_ack_future().
Add a new public exception TopicWriterBufferFullError surfaced via ydb.topic.
Add sync/async tests and an example showing how to use the buffer limit.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`ydb/topic.py`	Exports the new `TopicWriterBufferFullError` as part of the public Topic API.
`ydb/_topic_writer/topic_writer.py`	Adds settings fields, new exception type, and a helper for message size accounting.
`ydb/_topic_writer/topic_writer_asyncio.py`	Implements buffer accounting + backpressure wait loop and updates ack handling/shutdown behavior.
`ydb/_topic_writer/topic_writer_test.py`	Adds sync writer tests around error propagation/blocking behavior.
`ydb/_topic_writer/topic_writer_asyncio_test.py`	Adds async tests for timeout, ack-unblocking, and concurrent waiting writers.
`examples/topic/writer_example.py`	Documents how to configure and handle backpressure in user code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T18:23:17Z

ydb/_topic_writer/topic_writer.py

+    data_len = len(msg.data)
+    meta_len = sum(len(k) + len(v) for k, v in msg.metadata_items.items()) if msg.metadata_items else 0
+    return data_len + meta_len + 64  # 64 bytes overhead per message (seq_no, timestamps, etc.)
+
+


internal_message_size_bytes() currently uses len(msg.data) and len(k) for metadata keys. This can drift from the actual bytes held in the internal queues because msg.data is later mutated during compression (_encode_data_inplace), and len(k) counts characters not UTF-8 bytes. Consider accounting against a stable byte length (e.g., msg.uncompressed_size plus len(k.encode('utf-8')) + len(v)), or store the accounted size on the message at enqueue time and subtract the same value on ack.

Suggested change

data_len = len(msg.data)

meta_len = sum(len(k) + len(v) for k, v in msg.metadata_items.items()) if msg.metadata_items else 0

return data_len + meta_len + 64 # 64 bytes overhead per message (seq_no, timestamps, etc.)

# Prefer a stable uncompressed size for accounting, fall back to current data length.

uncompressed_size = getattr(msg, "uncompressed_size", None)

data_len = uncompressed_size if isinstance(uncompressed_size, int) and uncompressed_size >= 0 else len(msg.data)

if msg.metadata_items:

meta_len = 0

for k, v in msg.metadata_items.items():

key_bytes_len = len(k.encode("utf-8"))

if isinstance(v, str):

value_bytes_len = len(v.encode("utf-8"))

else:

value_bytes_len = len(v)

meta_len += key_bytes_len + value_bytes_len

else:

meta_len = 0

# 64 bytes overhead per message (seq_no, timestamps, etc.)

return data_len + meta_len + 64

Copilot · 2026-03-31T18:23:17Z

ydb/_topic_writer/topic_writer_asyncio.py

+        max_buf = self._settings.max_buffer_size_bytes
+        if max_buf is not None:
+            new_bytes = sum(internal_message_size_bytes(m) for m in internal_messages)
+            timeout_sec = self._settings.buffer_wait_timeout_sec
+            deadline = self._loop.time() + timeout_sec
+            while True:
+                self._buffer_updated.clear()
+                if self._buffer_bytes + new_bytes <= max_buf:
+                    break
+                if self._loop.time() >= deadline:
+                    raise TopicWriterBufferFullError(
+                        "Topic writer buffer full: no free space within %.1f s (buffer=%d, need=%d, max=%d)"
+                        % (timeout_sec, self._buffer_bytes, new_bytes, max_buf)
+                    )
+                try:
+                    await asyncio.wait_for(
+                        self._buffer_updated.wait(),
+                        timeout=min(0.5, max(0.01, deadline - self._loop.time())),
+                    )
+                except asyncio.TimeoutError:
+                    pass
+            self._buffer_bytes += new_bytes
+


Buffer reservation is computed before encoding (new_bytes = sum(internal_message_size_bytes(m) ...)), but buffer release on ack recomputes size from the (possibly mutated) InternalMessage. When codec selection/compression changes message.data, _buffer_bytes can drift (over- or under-count), breaking backpressure. A robust approach is to compute a single accounted size per message at enqueue time (or use a stable field like uncompressed_size) and subtract that same value on ack.

Copilot · 2026-03-31T18:23:18Z

ydb/_topic_writer/topic_writer_asyncio.py

+            deadline = self._loop.time() + timeout_sec
+            while True:
+                self._buffer_updated.clear()
+                if self._buffer_bytes + new_bytes <= max_buf:
+                    break
+                if self._loop.time() >= deadline:
+                    raise TopicWriterBufferFullError(
+                        "Topic writer buffer full: no free space within %.1f s (buffer=%d, need=%d, max=%d)"
+                        % (timeout_sec, self._buffer_bytes, new_bytes, max_buf)
+                    )
+                try:
+                    await asyncio.wait_for(
+                        self._buffer_updated.wait(),
+                        timeout=min(0.5, max(0.01, deadline - self._loop.time())),
+                    )
+                except asyncio.TimeoutError:
+                    pass


The backpressure wait loop doesn’t react to writer stop/close while waiting for _buffer_updated (it only calls _check_stop() once at the start). If the writer is stopped while a caller is blocked here, it will typically wait until buffer_wait_timeout_sec and then raise TopicWriterBufferFullError, masking the real stop reason. Consider checking _stop_reason inside the loop (or waiting on both _buffer_updated and _stop_reason) and/or setting _buffer_updated in _stop() to wake waiters immediately.

Copilot · 2026-03-31T18:23:18Z

ydb/_topic_writer/topic_writer_asyncio.py

            f.set_exception(reason)
+            f.exception()  # mark as retrieved so asyncio does not log "Future exception was never retrieved"


_stop() unconditionally calls f.set_exception(reason) and then f.exception() for every future. If a caller cancels a returned ack future (or it’s already done), set_exception() can raise InvalidStateError, and f.exception() can raise CancelledError, potentially breaking shutdown/error propagation. Guard with if not f.done(): ... (and handle cancelled futures) before setting/reading exceptions.

Suggested change

f.set_exception(reason)

f.exception() # mark as retrieved so asyncio does not log "Future exception was never retrieved"

if not f.done():

try:

f.set_exception(reason)

except asyncio.InvalidStateError:

# Future might have been completed or cancelled concurrently; ignore.

pass

try:

# Mark exception as retrieved so asyncio does not log

# "Future exception was never retrieved".

f.exception()

except asyncio.CancelledError:

# It is valid for callers to cancel ack futures; ignore.

pass

Topic Writer Backpressure

b60d9ef

vgvoleg requested a review from Copilot March 31, 2026 18:18

Copilot started reviewing on behalf of vgvoleg March 31, 2026 18:19 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic Writer Backpressure#797

Topic Writer Backpressure#797
vgvoleg wants to merge 1 commit intomainfrom
writer_buffer_limit

vgvoleg commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    data_len = len(msg.data)
-    meta_len = sum(len(k) + len(v) for k, v in msg.metadata_items.items()) if msg.metadata_items else 0
-    return data_len + meta_len + 64  # 64 bytes overhead per message (seq_no, timestamps, etc.)
+    # Prefer a stable uncompressed size for accounting, fall back to current data length.
+    uncompressed_size = getattr(msg, "uncompressed_size", None)
+    data_len = uncompressed_size if isinstance(uncompressed_size, int) and uncompressed_size >= 0 else len(msg.data)
+    if msg.metadata_items:
+        meta_len = 0
+        for k, v in msg.metadata_items.items():
+            key_bytes_len = len(k.encode("utf-8"))
+            if isinstance(v, str):
+                value_bytes_len = len(v.encode("utf-8"))
+            else:
+                value_bytes_len = len(v)
+            meta_len += key_bytes_len + value_bytes_len
+    else:
+        meta_len = 0
+    # 64 bytes overhead per message (seq_no, timestamps, etc.)
+    return data_len + meta_len + 64

		f.set_exception(reason)
		f.exception() # mark as retrieved so asyncio does not log "Future exception was never retrieved"

-            f.set_exception(reason)
-            f.exception()  # mark as retrieved so asyncio does not log "Future exception was never retrieved"
+            if not f.done():
+                try:
+                    f.set_exception(reason)
+                except asyncio.InvalidStateError:
+                    # Future might have been completed or cancelled concurrently; ignore.
+                    pass
+            try:
+                # Mark exception as retrieved so asyncio does not log
+                # "Future exception was never retrieved".
+                f.exception()
+            except asyncio.CancelledError:
+                # It is valid for callers to cancel ack futures; ignore.
+                pass

Conversation

vgvoleg commented Mar 31, 2026

Pull request type

What is the current behavior?

What is the new behavior?

Other information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants