Fast oblique#360
Conversation
A list of command code to reset
adam2392
left a comment
There was a problem hiding this comment.
Cool! What's the general speedup you're observing w/ this alternative sampling method?
| indices_to_sample[i], indices_to_sample[j] | ||
|
|
||
|
|
||
| cdef void floyd_sample_indices( |
There was a problem hiding this comment.
Thanks, Yuxin!
After inlining the Floyd sampling method, the overhead in SPORF has been eliminated. Now, Floyd’s method is consistently faster than the original Fisher-Yates approach.
(Sorry for the inconsistency in training time below — the before-and-after tests were run on different physical machines.)
Here below are the comparisons between original treeple (using fisher yates shuffle, right) and new implemented treeple (left).
There was a problem hiding this comment.
Yep! without inline there is overhead since fisher Yates function was also inlined. Few questions:
Can you put those plots on the same scale?
Also how many reps did you run per cell? If you ran 5-10 then this is nice.
| # sample 'n_non_zeros' in a mtry X n_features projection matrix | ||
| # which consists of +/- 1's chosen at a 1/2s rate |
There was a problem hiding this comment.
Can you explain why the below for loop is still needed then? Perhaps what would help is some additional explanation of what the expected input/output is in the floyd_sample_indices.
There was a problem hiding this comment.
The input would be:
out - preallocated buffer to hold the output of sampled indices (size ≥ k);
k - number of samples to be drawn;
n - Population size
and a random state
The output is out, filled in-place with k unique random integers selected uniformly without replacement from the interval [0, n).
In _oblique_splitter.pyx, it samples n_non_zeros unique integers from the range [0, grid_size) — and stores them in indices_to_sample[:]. The for loop is needed because we need to map 1D indices back to 2D coordinates in the projection matrix and assign weights to each index
The speedup scale depends on the projection matrix size (max_features and number of features in dataset). When the number of features in dataset increases and the feature_combinations kept the same, the new sampling methods would have relatively constant training time. In contrast, the original sampling method would have exponentially increased training time, while the dataset has increased number of feature and keeping the other conditions constant. It could speed up 10 times when the projection matrix is as large as 4096x4096. However, for smaller projection matrices (e.g., 64×64), the new method may incur a slight overhead, resulting in a slowdown of less than 10%. |
Co-authored-by: Adam Li <adam2392@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
make floyd_sample_indices() inlined
| cdef intp_t i, r, count = 0 | ||
|
|
||
| for i in range(n - k, n): | ||
| r = rand_int(0, i + 1, random_state) | ||
| if seen.find(r) == seen.end(): | ||
| seen.insert(r) | ||
| out[count] = r | ||
| else: | ||
| seen.insert(i) | ||
| out[count] = i | ||
| count += 1 |
There was a problem hiding this comment.
cdef intp_t i, r = 0
for i in range(n - k, n):
r = rand_int(0, i + 1, random_state)
if seen.find(r) == seen.end():
seen.insert(r)
out[i - n + k] = r
else:
seen.insert(i)
out[i - n + k] = iA little simplification suggestion. @ClarkXu0625
PSSF23
left a comment
There was a problem hiding this comment.
@ClarkXu0625 Awesome. Just clean up the styles according to the cython-lint errors. And I think we are good to go? @adam2392 @YuxinB
|
Kay, will merge! Thank you all! |





Reference Issues/PRs
What does this implement/fix? Explain your changes.
Replace Fisher Yates Shuffle by Floyd's method, a more efficient approach to draw uniform distribution from large ranges, in SPORF (i.e. ObliqueRandomForestClassifier). When the projection matrix is huge (i.e. large number of data features and/or max_features), this update would reduce training time significantly without affecting prediction.
Any other comments?