Skip to content

Speed up range sampling.#115

Closed
bhickey wants to merge 1 commit into
rust-random:masterfrom
bhickey:sample
Closed

Speed up range sampling.#115
bhickey wants to merge 1 commit into
rust-random:masterfrom
bhickey:sample

Conversation

@bhickey

@bhickey bhickey commented Aug 26, 2016

Copy link
Copy Markdown

This change removes division from the rejection sampler in random_range().
Replace divisions in range sampling with bitshifting. Instead of finding a
well fitting range, we generate a number [0,2^n) and reject out of range
values.

In synthetic benchmarks, this approximately doubles the throughput.

This change removes division from the rejection sampler in random_range().
Replace divisions in range sampling with bitshifting. Instead of finding a
well fitting range, we generate a number [0,2^n) and reject out of range
values.

In synthetic benchmarks, this approximately doubles the throughput.
@dhardy

dhardy commented Aug 17, 2017

Copy link
Copy Markdown
Member

In synthetic benchmarks, this approximately doubles the throughput.

I can't duplicate this. In my tests:

  • ~3070ns per i64 with the old algorithm
  • ~3200ns per i64 with your algorithm
  • ~3340ns per i64 with your algorithm, but recalculating leading_zeros() each call instead of storing an lz field
  • ~10200ns per i64 with the old algorithm, but not storing the rejection zone field

So your algorithm most definitely is useful if we wish to cut out the extra field or where the range can't be pre-computed, but not with the benchmark you used(?).

Actually, my test was using a different range (Range::new(3i64, 134217671i64)) and type. Using Range::new(10, 10000) from your benchmark, I get ~2600ns with the old algorithm and ~6200ns with your algorithm: more than twice as slow (and slower for i32 than i64). I'm not sure why it's this bad.

@pitdicker

Copy link
Copy Markdown
Contributor

Using bitshifts instead of a modulus improves performance.
But the usable zone is a bad approximation: depending on the range only on average 50% of the generated values pass. So there is a high change the RNG has to run multiple times, and the branch prediction becomes terrible.

Benchmarking this method is a bit more involved, because the results very much depend on how close a range is to a power of two.

I think dhardy#2, dhardy#68 and dhardy#69 help more to improve the performance of Range. Techniques are: never working on values smaller than 32 bits, using a widening multiply instead of modulus, and a sample_single function using the bitshift technique you described.

@dhardy

dhardy commented Dec 12, 2017

Copy link
Copy Markdown
Member

Yes, sounds like we can close this now (still need to get that code merged of course)!

@dhardy dhardy closed this Dec 12, 2017
@pitdicker

Copy link
Copy Markdown
Contributor

Thanks for the effort on this though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants