[feature request] Better control about the `permute` strategy

[We had a conversation on IRC with @tomsmeding about](https://ircbrowse.tomsmeding.com/browse/lchaskell?id=1529868#trid1529868) that it would be great to have different strategies available for executing the `permute` array scatters. Especially for cases when the "scatter" actually compresses the array (histogram-style use) and on GPUs, the locks may hurt quite a lot.

A canonical way to remove (or reduce) the locking is to split the problem into a few smaller permutes that scatter the data into a few "privatized" arrays, and doing a simple reduction later.

Because the optimal configuration for that scheme is often very unclear (depends on too many cache-related factors), I wouldn't really like to have `permute` try to solve this automagically, but more like to expose some of the internals where I'd be able to tell Accelerate to do the permute in a specific way. E.g., "allocate" the matrices myself, and have the computation separated into proper threads. (Chances are that this can be expressed in current Accelerate without new features, I just don't know how :D )

As the general target, I'd love to support what @krulis-martin did in this paper: https://dl.acm.org/doi/abs/10.1145/3404397.3404426  (if you don't have ACM access, the PDF scihubs well). He used a combination of the privatization+locking, see figs 8 and 9 in the paper.

Optional challenge/question: Specifically for the k-means usecase, the `permute` looks like it's not scattering single elements but whole blocks of data (vectors of sufficiently chunky dimensions to feed some SIMD). Is it possible (or necessary?) to tell Accelerate that the locking can be done on a whole SIMD-block level, thus possibly saving some synchronization time?

Thanks for any ideas!
-mk

PS. I've got Accelerate source code for the above, will hopefully publish it in a few days and link here so that everyone can see the this in action.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature request] Better control about the `permute` strategy #552

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request] Better control about the permute strategy #552

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[feature request] Better control about the `permute` strategy #552