-
Notifications
You must be signed in to change notification settings - Fork 360
Description
Is there an optimization available for a matrix multiplication with its transpose? I'm trying to optimize a program where the slowest part is m.t() * m (a somewhat big matrix in the most inner loop). I read more about this and it always give a symmetrical matrix, which would make the operation n(n+1)/2 instead of n^2. The lapack function dsyrk is supposed to handle that. I don't know if it would actually help but I'm curious to test.
Also, is there a way to know if I'm using lapack? I didn't give any special feature to ndarray in my cargo.toml file. A perf told me 29.26% _ZN14matrixmultiply4gemm13masked_kernel so I think I'm using it because gemm is a lapack name. But is there a simpler way?
EDIT: Oh, sorry, I meant BLAS everywhere in my text. I wasn't aware of the difference :)