-
Notifications
You must be signed in to change notification settings - Fork 106
argon2: make parallel implementation safe #154
Description
A parallel rayon-based implementation was added in #149, however it works by using raw pointers and creating mutable references from them. The references aren't aliased so in theory it should be sound, but a safe implementation would be better.
I left some comments on how to implement parallel processing safely in this comment:
The algorithm operates over a matrix of "blocks" consisting of:
- 𝒑 rows ("lanes"), where 𝒑 is the desired number of worker threads
- 𝒒 columns
Each lane is further subdivided into 𝑺 = 4 "slices" (in Argon2 terminology, and referred to as
SYNC_POINTSin the code), and the intersection of a slice and a lane forms a segment of length 𝒒 / 𝑺.Segments of the same slice are computed in parallel and therefore cannot reference each other. So from a memory model perspective, what we'd really like is to mutably borrow the values of a particular slice, partition them into segments, and give each worker thread access to a particular segment.
However, we also need to allow all of the worker threads to simultaneously borrow all of the other blocks which do not belong to the "slice" being operated on to reference as inputs.
This is the tricky part: the
fill_segmentoperation can reference blocks from the current lane, or other lanes, but will not reference blocks from the same "slice" being operated on.Having just written all of that down (thanks for rubber ducking if nothing else), I think I have a better idea of how to model this problem safely in Rust: "slices" (in the Argon2 sense) should be the core level of granularity in which the working "memory" is organized.
The main loop of the algorithm iterates over the slices. Provided I'm actually understanding this correctly, we can borrow one slice mutably at the time and the others immutably. The mutably borrowed slice can then be subdivided into a segment for each lane, given to the worker threads along with immutable references to all of the other slices.
I think a big part of what's making this so tricky right now is the memory consists of a contiguous
Vec<Block>. I think that might still be fine for the backing storage, but perhaps we could mediate access to segments through another type that splits the borrows by Argon2 "slice", allowing one "slice" to be acted on mutably and the others referenced as inputs.I think a next step which might be helpful in general is to extract a struct Memory which borrows from a backing [Block] slice, pass that to Instance::new instead of the raw &'a mut [Block], and provide a method on Memory for accessing blocks by their Position.
From there we can look at borrow splitting the backing buffer so Memory only holds 3 of the 4 "slices" at a time, and can be used to look up the "reference" blocks being used to fill a segment, but it would not have access to the slice being operated over (which would be borrowed mutably and split up into segments among the worker threads).
Note that after posting this comment, a Memory struct was extracted which is the first step.
The next steps would be using the Memory struct to access Argon2 "slices", then being able to safely split borrows of a mutable slice and immutable slices as outlined above.