Implement custom fold for ZipLongest by Owen-CH-Leung · Pull Request #764 · rust-itertools/itertools

Owen-CH-Leung · 2023-09-28T13:00:01Z

As per discussion in #755 , this PR implements custom fold logic for ZipLongest<T, U> and create benchmark test to measure the performance gain compared with the default fold.

Running the proposed benchmark using the default fold from iterator, the time was time: [592.01 ns 593.24 ns 594.70 ns]
Running the proposed benchmark using the proposed fold logic, the time was time: [486.75 ns 487.88 ns 489.29 ns]

But the performance gain is not so significant if the setup changes to (0..1024).zip_longest(0..768)

using the default fold logic: time: [509.25 ns 510.21 ns 511.35 ns]
using the proposed fold logic: time: [485.58 ns 486.24 ns 486.97 ns]

Philippe-Cholet · 2023-09-28T14:32:13Z

I'm not very familiar with benchmarks but I don't think we can expect it to be two times faster.
It seems to be respectively 22% and 5% faster. It seems reasonably good to me.
What was the "proposed benchmark" you replaced?

PS: As you can see here, you needed to run cargo fmt in order to pass CI tests.

EDIT (after response): Ah 1024 and 768 are just mirrored, ok. I did not expect such difference (22% vs 5%) for such change.
Side note: After a CI failure yesterday, I'm trying a custom .git/hooks/pre-commit (gist) to prevent most CI failures.

Owen-CH-Leung · 2023-09-28T14:42:35Z

I'm not very familiar with benchmarks but I don't think we can expect it to be two times faster. It seems to be respectively 22% and 5% faster. It seems reasonably good to me. What was the "proposed benchmark" you replaced?

PS: As you can see here, you needed to run cargo fmt in order to pass CI tests.

Sorry - by "proposed benchmark", I mean the ziplongest benchmark function that I added in this PR. Yes the performance is around 2x% amd 5% faster

jswrenn

Thank you so much for this! I've left a comment on how to further optimize fold. Could you also add a test for the specialization in tests/specializations.rs? Something like this should do the trick:

quickcheck! {
     fn intersperse(v: Vec<u8>) -> () {
         test_specializations(&v.into_iter().intersperse(0));
     }
+
+    fn zip_longest(a: Vec<u8>, b: Vec<u8>) -> () {
+        test_specializations(&a.into_iter().zip_longest(b))
+    }
 }

jswrenn · 2023-09-28T15:33:50Z

+    #[inline]
+    fn fold<B, F>(self, mut acc: B, mut f: F) -> B
+    where
+        Self: Sized,
+        F: FnMut(B, Self::Item) -> B,
+    {
+        let ZipLongest { mut a, mut b } = self;
+
+        loop {
+            match (a.next(), b.next()) {
+                (Some(x), Some(y)) => acc = f(acc, EitherOrBoth::Both(x, y)),
+                (Some(x), None) => {
+                    acc = f(acc, EitherOrBoth::Left(x));
+                    // b is exhausted, so we can drain a.
+                    return a.fold(acc, |acc, x| f(acc, EitherOrBoth::Left(x)));
+                }
+                (None, Some(y)) => {
+                    acc = f(acc, EitherOrBoth::Right(y));
+                    // a is exhausted, so we can drain b.
+                    return b.fold(acc, |acc, y| f(acc, EitherOrBoth::Right(y)));
+                }
+                (None, None) => return acc, // Both iterators are exhausted.
+            }
+        }
+    }


When specializing fold, try to avoid for loops and calls to next() on underlying iterators. If those underlying iterators also specialize fold, then for looping and calling next() bypasses their optimizations.

It's possible to implement this method in terms of fold:

Suggested change

#[inline]

fn fold<B, F>(self, mut acc: B, mut f: F) -> B

where

Self: Sized,

F: FnMut(B, Self::Item) -> B,

{

let ZipLongest { mut a, mut b } = self;

loop {

match (a.next(), b.next()) {

(Some(x), Some(y)) => acc = f(acc, EitherOrBoth::Both(x, y)),

(Some(x), None) => {

acc = f(acc, EitherOrBoth::Left(x));

// b is exhausted, so we can drain a.

return a.fold(acc, |acc, x| f(acc, EitherOrBoth::Left(x)));

}

(None, Some(y)) => {

acc = f(acc, EitherOrBoth::Right(y));

// a is exhausted, so we can drain b.

return b.fold(acc, |acc, y| f(acc, EitherOrBoth::Right(y)));

}

(None, None) => return acc, // Both iterators are exhausted.

}

}

}

#[inline]

fn fold<B, F>(self, mut init: B, mut f: F) -> B

where

Self: Sized,

F: FnMut(B, Self::Item) -> B,

{

let ZipLongest { a, mut b } = self;

init = a.fold(init, |init, a| match b.next() {

Some(b) => f(init, EitherOrBoth::Both(a, b)),

None => f(init, EitherOrBoth::Left(a)),

});

b.fold(init, |init, b| f(init, EitherOrBoth::Right(b)))

}

Thanks @jswrenn for your feedback. I tried to run the benchmark against your recommendation but the result is regressing

time: [594.12 ns 594.85 ns 595.65 ns] change: [+20.625% +21.035% +21.434%] (p = 0.00 < 0.05) Performance has regressed.

I was gonna suggest to try to rely a bit on zip from the std with init = a.by_ref().zip(b.ref()).fold(init, &mut f); ... but Zip do not have a fold specialization yet. It would allow to discard the loop (and the Some, Some case).

What about f(init, match b.next() {...}) (in the first fold)? (I guess it's the same?)

Yes unfortunately. I tried :

{ let ZipLongest { a, mut b } = self; init = a.fold(init, |init, a| f(init, match b.next() { Some(b) => EitherOrBoth::Both(a, b), None => EitherOrBoth::Left(a), })); b.fold(init, |init, b| f(init, EitherOrBoth::Right(b))) }

and the result is same: time: [593.41 ns 594.82 ns 596.39 ns]

In #765 I considered folding a range k+1..=n but did not because it was slower, which surprised me. It seems weird to even suggest but maybe it's because of Range*::fold?!
Since folds in the benchmark rely on the same thing here, maybe you can try with different iterators.
Like let v1 = black_box((0..***).collect_vec()); v2... (before bench_function) and then v1.iter().zip_longest(v2.iter()).

@jswrenn Thanks a lot! Yes after looking at the 2 files you shared, I can understand why for loop should always be avoided (and use the underlying fold api which has been optimized). That's really inspiring to me. I've incorporated your suggestions just now. Thanks a lot!

@Philippe-Cholet Yes the issue is on Range*::fold. I adopted your suggestions and rerun the benchmark. Using the default fold (which calls next) the time is time: [2.0522 µs 2.0558 µs 2.0599 µs], and using the new fold logic, I can see a huge improvement time: [414.34 ns 415.44 ns 416.75 ns]

Thanks both!

Diving... Apparently, Range (a..b) does not specialize fold (so maybe that's why there was bad perf), but RangeInclusive (a..=b) does relying on try_fold (implementation seems tricky to me). std::vec::IntoIter does not either, but std::slice::Iter which I redirected you to (without much insight) does.
So I guess we should continue with some_vec.iter().fold(...) for benchmarks.

Indeed. In future PR that optimizes fold, we should stick with some_vec.iter().fold(...) for benchmarking

@jswrenn Do you have any further feedback for my PR ? If no I think we can merge this.

Usual link for how things should prefer forwarding internal iteration to internal iteration where possible: https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb.

chain is the usual example where internal iteration tends to be measurably faster than external. For things like Range<usize> or slice::Iter, there's typically no difference.

(So this is my +1 to forwarding fold to fold where possible, and when we finally make it possible to implement try_fold, having that forward to try_fold.)

jswrenn

This looks great! Thanks.

jswrenn · 2023-10-03T17:00:36Z

Sigh. While attempting to fix a spurious merge conflict error, I reset the head back slightly too far. Github autoclosed the PR, and now I don't have edit access.

@Owen-CH-Leung Can you just run git push -f on your end (without first pulling)? That ought to do the trick.

Owen-CH-Leung · 2023-10-04T02:54:04Z

Hi @jswrenn , I run git push -f without pulling and creates this new PR.

#774

jswrenn requested changes Sep 28, 2023

View reviewed changes

jswrenn added this to the next milestone Sep 28, 2023

jswrenn mentioned this pull request Sep 28, 2023

Powerset::fold specialization #765

Merged

Owen-CH-Leung requested a review from jswrenn October 1, 2023 13:22

jswrenn approved these changes Oct 3, 2023

View reviewed changes

jswrenn added this pull request to the merge queue Oct 3, 2023

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Oct 3, 2023

jswrenn added this pull request to the merge queue Oct 3, 2023

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Oct 3, 2023

jswrenn force-pushed the Implement-custom-fold-for-zip-longest branch from 3e5395d to 575819a Compare October 3, 2023 16:54

jswrenn closed this Oct 3, 2023

jswrenn force-pushed the Implement-custom-fold-for-zip-longest branch from 575819a to c4358ab Compare October 3, 2023 16:55

Owen-CH-Leung mentioned this pull request Oct 4, 2023

Implement custom fold for zip longest #774

Merged

Conversation

Owen-CH-Leung commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Philippe-Cholet commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Owen-CH-Leung commented Sep 28, 2023

Uh oh!

jswrenn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottmcm Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jswrenn left a comment

Choose a reason for hiding this comment

Uh oh!

jswrenn commented Oct 3, 2023

Uh oh!

Owen-CH-Leung commented Oct 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Owen-CH-Leung commented Sep 28, 2023 •

edited

Loading

Philippe-Cholet commented Sep 28, 2023 •

edited

Loading

scottmcm Oct 3, 2023 •

edited

Loading