Add sample_single to Range by pitdicker · Pull Request #69 · dhardy/rand

pitdicker · 2017-12-11T19:18:41Z

As discussed in #68.

Benchmarks before (with i128_support):

test gen_range_i8             ... bench:       6,912 ns/iter (+/- 6) = 144 MB/s
test gen_range_i16            ... bench:       6,491 ns/iter (+/- 11) = 308 MB/s
test gen_range_i32            ... bench:       7,274 ns/iter (+/- 48) = 549 MB/s
test gen_range_i64            ... bench:      11,476 ns/iter (+/- 36) = 697 MB/s
test gen_range_i128           ... bench:     279,477 ns/iter (+/- 552) = 57 MB/s

After:

test gen_range_i8             ... bench:       5,350 ns/iter (+/- 5) = 186 MB/s
test gen_range_i16            ... bench:       5,347 ns/iter (+/- 3) = 374 MB/s
test gen_range_i32            ... bench:       3,922 ns/iter (+/- 22) = 1019 MB/s
test gen_range_i64            ... bench:       8,228 ns/iter (+/- 35) = 972 MB/s
test gen_range_i128           ... bench:     148,177 ns/iter (+/- 322) = 107 MB/s

And the normal code with a one-time set-up cost for comparison:

test distr_range_i8           ... bench:       2,685 ns/iter (+/- 27) = 372 MB/s
test distr_range_i16          ... bench:       2,866 ns/iter (+/- 29) = 697 MB/s
test distr_range_i32          ... bench:       3,361 ns/iter (+/- 53) = 1190 MB/s
test distr_range_i64          ... bench:       3,321 ns/iter (+/- 64) = 2408 MB/s
test distr_range_i128         ... bench:     140,754 ns/iter (+/- 1,327) = 113 MB/s

The performance of sample_single depends quite a bit on how close the range is to a power of two, but I have not investigated the details... Performance seems worth the extra method.

I am not sure the order of arguments is the best choice, with (low, high, rng). Would (rng, low, high) be better?
Also I tried to make sample_single have a default implementation, like in RangeFloat, but couldn't get it to work.

pitdicker · 2017-12-11T20:36:06Z

I am seeing some improvements in other benchmarks too:

Before:

test distr_uniform_ascii_char ... bench:      12,196 ns/iter (+/- 222) = 327 MB/s

test misc_sample_indices_100_of_1k   ... bench:       1,402 ns/iter (+/- 1)
test misc_sample_indices_10_of_1k    ... bench:         528 ns/iter (+/- 0)
test misc_sample_indices_50_of_1k    ... bench:         780 ns/iter (+/- 1)
test misc_sample_iter_10_of_100      ... bench:       1,266 ns/iter (+/- 3)
test misc_sample_slice_10_of_100     ... bench:         182 ns/iter (+/- 0)
test misc_sample_slice_ref_10_of_100 ... bench:         180 ns/iter (+/- 0)

After:

test distr_uniform_ascii_char ... bench:       3,713 ns/iter (+/- 4) = 1077 MB/s

test misc_sample_indices_100_of_1k   ... bench:         669 ns/iter (+/- 1)
test misc_sample_indices_10_of_1k    ... bench:         437 ns/iter (+/- 1)
test misc_sample_indices_50_of_1k    ... bench:         399 ns/iter (+/- 0)
test misc_sample_iter_10_of_100      ... bench:         946 ns/iter (+/- 7)
test misc_sample_slice_10_of_100     ... bench:         142 ns/iter (+/- 0)
test misc_sample_slice_ref_10_of_100 ... bench:         140 ns/iter (+/- 0)

pitdicker · 2017-12-12T06:24:31Z

Just added a little optimisation to ascii_word_char. It's range is close enough to a power of two that a bitmask is more efficient.

test distr_uniform_ascii_char ... bench:       2,754 ns/iter (+/- 11) = 1452 MB/s

With choose I see a optimisation, but don't know if it is possible. It needs an usize in some range to use as slice index. on x86_64 this is the same as an u64, which is twice as slow as u32. It would help if we could assume the slice length < 2^32. Would it be ok to document: "Choose will only pick from the first 2^32 values of a slice"?

dhardy · 2017-12-12T06:33:00Z

Hmm, not sure how to answer this. I'm not so happy with the use of usize for container sizes everywhere since for the vast majority of uses u32 is sufficient, but it's what we have, and it seems odd doing something different here.

pitdicker · 2017-12-12T08:50:23Z

With i128_support on x86_64 a range reduction on u64 should be about as fast as on u32.

For cryptographic RNG's generating u64's instead of u32's should be about 50~60 percent slower. It is not 100% because of (thanks to) the overhead of reading from the buffered the results. At least with an improvement I made for HC-128, that should also work for ISAAC and ChaCha.

Small fast RNG's mostly need to operate on 64-bit words for good statistical quality. There are some really fast ones where on x86_64 generating u64's has the same excellent performance as generating u32's.

So on second thought using usize on x86_64 should eventually have similar performance as using an u32 as index. We should just leave choose as it is, without subtleties.

pitdicker · 2017-12-12T10:09:52Z

Now I wonder if the optimization of ascii_word_char is worth it. It is only about 10% faster then when we have better RNG's and i128_support.

I do like that it removes a complex dependency between Uniform, sequences, Sample and Range. Before the chain looked like: Uniform::ascii_word_char → sequences::Choose → Sample::gen_range → Range::sample::single. I think it makes some sense to end up with distributions that only depend on rand_core (and std/core). Edit: and utils::FloatConversions.

dhardy · 2017-12-12T12:47:45Z

Yes, the optimisation/standalone impl for ascii_word_char is a good idea in my opinion.

Not sure what you mean here?

Also I tried to make sample_single have a default implementation, like in RangeFloat, but couldn't get it to work.

pitdicker · 2017-12-12T13:09:34Z

It would be nice to have something like this as a default implementation:

fn sample_single<R: Rng+?Sized>(low: Self::X, high: Self::X, rng: &mut R) -> Self::X {
    let range: Self = RangeImpl::new(low, high);
    range.sample(rng)
}

Then we would not have to add this code when there is no faster / specialized method available for some type (like the one in the tests of this module). But I could not get it to work because Rust can't figure out the types (and neither could I...).

dhardy · 2017-12-12T14:59:42Z

Ah. It's actually quite simple:

error[E0277]: the trait bound `Self: core::marker::Sized` is not satisfied                                                             
   --> src/distributions/range.rs:107:13                                                                                               
    |                                                                                                                                  
107 |         let range: Self = RangeImpl::new(low, high);                                                                             
    |             ^^^^^ `Self` does not have a constant size known at compile-time

What the compiler is saying is that you can only construct sized types; here you are trying to construct range of type RangeImpl, but it doesn't know that the RangeImpl trait can only be implemented for Sized types (e.g. you could try to impl RangeImpl for [u8]). You can fix this by specifying trait RangeImpl: Sized { ... }; this restricts implementations to sized types.

pitdicker · 2017-12-12T15:32:27Z

I can't really imagine a RangeImpl for slices, so a Sized bound should not really be a restriction. Thank you, 20 more errors and I will start to get a feeling for traits 😄. Will add a commit tomorrow.

pitdicker · 2017-12-13T07:09:43Z

-    /// The type sampled by this implementation (output type).
+pub trait RangeImpl: Sized {
+    /// The type sampled by this implementation.
    type X: PartialOrd;


Why does this type have a PartialOrd bound?

Um... it might be to allow the low < high check. You can try removing it if you like.

Now that I looked a bit better at it, SampleRange already has PartialOrd+Sized as trait bounds, which takes care of the low < high check. But the extra bound kind of makes sense: you can't speak about a range between two points if there is not some sort of order between the elements. I will leave it for now.

pitdicker · 2017-12-13T15:19:40Z

I think this is ready now

dhardy · 2017-12-13T20:55:03Z

Great, thanks!

I'm going to assume that the sampling is correct; I glanced over the code but there's so many routines just for ranges now; then there are all the other distributions. I'm thinking an external tool to plot 10'000+ samples for visual inspection might be useful, plus some "synthetic tests" (checking a few inputs produce the expected values).

pitdicker · 2017-12-13T21:07:23Z

Now that you mention it, I did not add any tests... The tests for choose cover testing if some values give plausible results, and if the values are in range.

dhardy · 2017-12-14T10:51:10Z

No, properly testing distributions would appear to be much more difficult than writing the distribution code.

pitdicker · 2017-12-14T11:17:45Z

Yes, I was thinking about that after your comment. A simple test could be to generate a lot of numbers, fill something like 256 buckets, and compare if they roughly follow the distribution. But that would only catch the worst errors. Shall I open an issue?

Add sample_single to Range

5a7dbb0

Use mask to reduce range in ascii_word_char

4d832a7

This was referenced Dec 12, 2017

cache the Range for gen_ascii_chars rust-random/rand#171

Closed

Speed up range sampling. rust-random/rand#115

Closed

pitdicker added 2 commits December 13, 2017 08:01

Add a default implementation for sample_single

1913e08

Merge remote-tracking branch 'upstream/master' into range_single_sample

6249973

pitdicker commented Dec 13, 2017

View reviewed changes

dhardy merged commit 4ebd21f into dhardy:master Dec 13, 2017

pitdicker deleted the range_single_sample branch December 13, 2017 21:03

dhardy mentioned this pull request Dec 14, 2017

Testing of distributions #72

Closed

dhardy added a commit that referenced this pull request Dec 14, 2017

Merge changes from #69 (sample_single)

107c3af

dhardy added a commit that referenced this pull request Dec 14, 2017

Merge changes from #69 (sample_single)

e90e9eb

Conversation

pitdicker commented Dec 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitdicker commented Dec 11, 2017

Uh oh!

pitdicker commented Dec 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Dec 12, 2017

Uh oh!

pitdicker commented Dec 12, 2017

Uh oh!

pitdicker commented Dec 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Dec 12, 2017

Uh oh!

pitdicker commented Dec 12, 2017

Uh oh!

dhardy commented Dec 12, 2017

Uh oh!

pitdicker commented Dec 12, 2017

Uh oh!

pitdicker Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

dhardy Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

pitdicker Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

pitdicker commented Dec 13, 2017

Uh oh!

dhardy commented Dec 13, 2017

Uh oh!

pitdicker commented Dec 13, 2017

Uh oh!

dhardy commented Dec 14, 2017

Uh oh!

pitdicker commented Dec 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pitdicker commented Dec 11, 2017 •

edited

Loading

pitdicker commented Dec 12, 2017 •

edited

Loading

pitdicker commented Dec 12, 2017 •

edited

Loading