Skip to content

Fast oblique#360

Merged
YuxinB merged 120 commits into
neurodata:fast-obliquefrom
ClarkXu0625:fast-oblique
Jun 14, 2025
Merged

Fast oblique#360
YuxinB merged 120 commits into
neurodata:fast-obliquefrom
ClarkXu0625:fast-oblique

Conversation

@ClarkXu0625
Copy link
Copy Markdown

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Replace Fisher Yates Shuffle by Floyd's method, a more efficient approach to draw uniform distribution from large ranges, in SPORF (i.e. ObliqueRandomForestClassifier). When the projection matrix is huge (i.e. large number of data features and/or max_features), this update would reduce training time significantly without affecting prediction.

Any other comments?

@YuxinB YuxinB requested review from PSSF23 and adam2392 and removed request for adam2392 May 31, 2025 21:15
Copy link
Copy Markdown
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! What's the general speedup you're observing w/ this alternative sampling method?

Comment thread treeple/tree/_utils.pyx Outdated
indices_to_sample[i], indices_to_sample[j]


cdef void floyd_sample_indices(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be inlined?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Yuxin!

After inlining the Floyd sampling method, the overhead in SPORF has been eliminated. Now, Floyd’s method is consistently faster than the original Fisher-Yates approach.
(Sorry for the inconsistency in training time below — the before-and-after tests were run on different physical machines.)

Here below are the comparisons between original treeple (using fisher yates shuffle, right) and new implemented treeple (left).

Before changing to inline:
image

After changing floyd to inline:
image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! without inline there is overhead since fisher Yates function was also inlined. Few questions:

Can you put those plots on the same scale?

Also how many reps did you run per cell? If you ran 5-10 then this is nice.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper two heatmap ran with 5 reps, and the lower two (not inlined) ran with 3 reps.

Here below are the heatmaps in the same scale, 5 reps
image

Comment thread treeple/tree/_utils.pyx Outdated
Comment thread treeple/tree/_utils.pyx Outdated
Comment on lines 245 to 246
# sample 'n_non_zeros' in a mtry X n_features projection matrix
# which consists of +/- 1's chosen at a 1/2s rate
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the below for loop is still needed then? Perhaps what would help is some additional explanation of what the expected input/output is in the floyd_sample_indices.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input would be:
out - preallocated buffer to hold the output of sampled indices (size ≥ k);
k - number of samples to be drawn;
n - Population size
and a random state

The output is out, filled in-place with k unique random integers selected uniformly without replacement from the interval [0, n).

In _oblique_splitter.pyx, it samples n_non_zeros unique integers from the range [0, grid_size) — and stores them in indices_to_sample[:]. The for loop is needed because we need to map 1D indices back to 2D coordinates in the projection matrix and assign weights to each index

@ClarkXu0625
Copy link
Copy Markdown
Author

Cool! What's the general speedup you're observing w/ this alternative sampling method?

The speedup scale depends on the projection matrix size (max_features and number of features in dataset). When the number of features in dataset increases and the feature_combinations kept the same, the new sampling methods would have relatively constant training time. In contrast, the original sampling method would have exponentially increased training time, while the dataset has increased number of feature and keeping the other conditions constant.

It could speed up 10 times when the projection matrix is as large as 4096x4096. However, for smaller projection matrices (e.g., 64×64), the new method may incur a slight overhead, resulting in a slowdown of less than 10%.

ClarkXu0625 and others added 3 commits June 4, 2025 03:16
Co-authored-by: Adam Li <adam2392@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
make floyd_sample_indices() inlined
@YuxinB
Copy link
Copy Markdown
Member

YuxinB commented Jun 6, 2025

Test the pr by generate the oblique demo with the Flody's sampling method: feature_combination = 1.5 here
image

Comment thread treeple/tree/_utils.pyx Outdated
Comment on lines +67 to +77
cdef intp_t i, r, count = 0

for i in range(n - k, n):
r = rand_int(0, i + 1, random_state)
if seen.find(r) == seen.end():
seen.insert(r)
out[count] = r
else:
seen.insert(i)
out[count] = i
count += 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    cdef intp_t i, r = 0

    for i in range(n - k, n):
        r = rand_int(0, i + 1, random_state)
        if seen.find(r) == seen.end():
            seen.insert(r)
            out[i - n + k] = r
        else:
            seen.insert(i)
            out[i - n + k] = i

A little simplification suggestion. @ClarkXu0625

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Hao! Just tested the running time for this cleaner version
(buttom right is the result training time from this improved version)

image

Copy link
Copy Markdown
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ClarkXu0625 Awesome. Just clean up the styles according to the cython-lint errors. And I think we are good to go? @adam2392 @YuxinB

@YuxinB
Copy link
Copy Markdown
Member

YuxinB commented Jun 12, 2025

Kay, will merge! Thank you all!

@YuxinB YuxinB merged commit 752d589 into neurodata:fast-oblique Jun 14, 2025
21 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants