Skip to content

Conversation

@BlackAsLight
Copy link
Contributor

This pull request offers a new TransformStream into the mix. While we do already have DelimiterStream, it isn't suitable if your delimiter is unlikely to appear for long stretches of bytes, meaning you could end up with huge chunks at a time. This pull request offers LimitDelimiterStream which as the name implies, offers a maximum length that a chunk can be.

Example

import { assertEquals } from "@std/assert";
import {
  LimitDelimiterStream,
} from "@std/streams/unstable-limit-delimiter-stream";

const encoder = new TextEncoder();

const readable = ReadableStream.from(["foo;beeps;;bar;;"])
  .pipeThrough(new TextEncoderStream())
  .pipeThrough(
    new LimitDelimiterStream({
      delimiter: encoder.encode(";"),
      limit: 4,
    }),
  );

assertEquals(
  await Array.fromAsync(readable),
  [
    { match: true, value: encoder.encode("foo") },
    { match: false, value: encoder.encode("beep") },
    { match: true, value: encoder.encode("s") },
    { match: true, value: encoder.encode("") },
    { match: true, value: encoder.encode("bar") },
    { match: true, value: encoder.encode("") },
  ],
);

@codecov
Copy link

codecov bot commented Nov 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.14%. Comparing base (6ee055f) to head (24693c6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6890   +/-   ##
=======================================
  Coverage   94.14%   94.14%           
=======================================
  Files         582      583    +1     
  Lines       42750    42806   +56     
  Branches     6807     6815    +8     
=======================================
+ Hits        40245    40301   +56     
  Misses       2455     2455           
  Partials       50       50           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@timreichen
Copy link
Contributor

Is there a particular use case for this stream?
I think it is kinda random to split with a delimiter as well as a max chunk size.
To me a some kind of SplitStream would make more sense and possibly could be made more flexible.
The name is also very confusing because one might think it would limit the chunk count instead of the individual chunk size as the Limited*Stream in @std/streams do.

@BlackAsLight
Copy link
Contributor Author

Is there a particular use case for this stream?

Yes. I'd like to offer a streaming FormData encoder and decoder and in it I found something like this useful. The normal DelimiterStream isn't suitable as it would likely produce huge chunks, possibly too big, when processing the files defeating the purpose of streaming.

I think it is kinda random to split with a delimiter as well as a max chunk size.

To me a some kind of SplitStream would make more sense and possibly could be made more flexible.

I'd need you to be more specific on what this SplitStream would do.

The name is also very confusing because one might think it would limit the chunk count instead of the individual chunk size as the Limited*Stream in @std/streams do.

I'm not attached to the name and open to suggestions if you think there is something better to communicate its intended behaviour.

@kt3k
Copy link
Member

kt3k commented Dec 1, 2025

Yes. I'd like to offer a streaming FormData encoder and decoder and in it I found something like this useful. The normal DelimiterStream isn't suitable as it would likely produce huge chunks, possibly too big, when processing the files defeating the purpose of streaming.

Sounds interesting to me. Can you add some note about such situation in API doc? (An example illustrating that situation would aslo be very helpful.)

@BlackAsLight
Copy link
Contributor Author

I added a note explaining more, but I can't really think of an example that's simple to demonstrate the point.

@kt3k kt3k changed the title feat(streams): new LimitDelimiterStream() feat(streams/unstable): new LimitDelimiterStream() Dec 8, 2025
@kt3k
Copy link
Member

kt3k commented Dec 8, 2025

@BlackAsLight Thanks for adding notes. Now I see the utility of this transform, however as @timreichen pointed, the name of the class sounds a bit confusing to me too. The meaning of limit is different from the usage in LimitedByteStream. Can you somehow come up with some other name candidates? How about ChunkedDelimiterStream, for example?

@timreichen
Copy link
Contributor

timreichen commented Dec 8, 2025

I think there are a few things to contemplate: If this Stream is limited to the chunk size max and a delimiter, this might as well just be an option on DelimiterStream instead of a separate stream, e.g. something like

const delimStream = new DelimiterStream(new TextEncoder().encode("|"), { maxChunkByteSize: 5 })

I would argue this is fairly limited in usage, so Maybe a better option would be some kind of preserveChunks option on the DelimiterStream that does not combine chunks and an addition of MaxChunkByteSizeStream in std/streams. This would keep the stream modularity.

stream
  .pipeThrough(new DelimiterStream(new Uint8Array(2), { preserveChunks: true }))
  .pipeThrough(new MaxChunkByteSizeStream(5))

@BlackAsLight
Copy link
Contributor Author

What about:

  • PartitionedDelimiterStream
  • TruncatedDelimiterStream
  • BoundedDelimiterStream
  • CappedDelimiterStream

I think CappedDelimiterStream could be good.

@BlackAsLight
Copy link
Contributor Author

BlackAsLight commented Dec 8, 2025

I would argue this is fairly limited in usage, so Maybe a better option would be some kind of preserveChunks option on the DelimiterStream that does not combine chunks and an addition of MaxChunkByteSizeStream in std/streams. This would keep the stream modularity.

stream
  .pipeThrough(new DelimiterStream(new Uint8Array(2), { preserveChunks: true }))
  .pipeThrough(new MaxChunkByteSizeStream(5))

It would still queue them all in before serving them up which defeats the purpose of streams

@timreichen
Copy link
Contributor

I would argue this is fairly limited in usage, so Maybe a better option would be some kind of preserveChunks option on the DelimiterStream that does not combine chunks and an addition of MaxChunkByteSizeStream in std/streams. This would keep the stream modularity.

stream
  .pipeThrough(new DelimiterStream(new Uint8Array(2), { preserveChunks: true }))
  .pipeThrough(new MaxChunkByteSizeStream(5))

It would still queue them all in before serving them up which defeats the purpose of streams

Maybe I lack some understanding of stream functionality, but how would enqueueing all chunks defeat the purpose of streams? All chunks go through each transform stream serially without blocking no?

@BlackAsLight
Copy link
Contributor Author

I would argue this is fairly limited in usage, so Maybe a better option would be some kind of preserveChunks option on the DelimiterStream that does not combine chunks and an addition of MaxChunkByteSizeStream in std/streams. This would keep the stream modularity.

stream
  .pipeThrough(new DelimiterStream(new Uint8Array(2), { preserveChunks: true }))
  .pipeThrough(new MaxChunkByteSizeStream(5))

It would still queue them all in before serving them up which defeats the purpose of streams

Maybe I lack some understanding of stream functionality, but how would enqueueing all chunks defeat the purpose of streams? All chunks go through each transform stream serially without blocking no?

I think there might have been some misunderstanding in communication. DelimiterStream at the moment is a Uint8Array stream and you're suggesting this preserve chunks option is going to change that to an object stream of { match: boolean, value: Uint8Array } or a Uint8Array[] stream? If you meant the former, then I don't understand what MaxChunkByteStream would be doing, and if the latter then DelimiterStream would be pulling in a lot of chunks before serving the next Uint8Array[], which defeats the purpose of streams for having only a small portion of the data being handled at once and could risk running out of memory as it's a huge stream and a rare delimiter.

@timreichen
Copy link
Contributor

I think there might have been some misunderstanding in communication. DelimiterStream at the moment is a Uint8Array stream and you're suggesting this preserve chunks option is going to change that to an object stream of { match: boolean, value: Uint8Array } or a Uint8Array[] stream? If you meant the former, then I don't understand what MaxChunkByteStream would be doing, and if the latter then DelimiterStream would be pulling in a lot of chunks before serving the next Uint8Array[], which defeats the purpose of streams for having only a small portion of the data being handled at once and could risk running out of memory as it's a huge stream and a rare delimiter.

Oh, I see. I guess having a separate stream class makes sense then, however maybe we could generalize that stream class so it could work not only with delimiters but is customizable to parse a json stream for example?

@kt3k
Copy link
Member

kt3k commented Dec 16, 2025

@BlackAsLight CappedDelimiterStream sounds good to me. Can you update the PR?

@timreichen

Oh, I see. I guess having a separate stream class makes sense then, however maybe we could generalize that stream class so it could work not only with delimiters but is customizable to parse a json stream for example?

That sounds a bit overly general to me, but please feel free to explore such API if you feel strongly. I think CappedDelimiterStream is fine as is as it has relatively concrete example use case.

@BlackAsLight BlackAsLight changed the title feat(streams/unstable): new LimitDelimiterStream() feat(streams/unstable): new CappedDelimiterStream() Dec 18, 2025
Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kt3k kt3k merged commit 7bb3552 into denoland:main Dec 22, 2025
19 checks passed
@BlackAsLight BlackAsLight deleted the limit_delimiter_stream branch December 22, 2025 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants