Add method for multiply-add aka axpy

Two reasons:
- Blas offers this as `axpy`, so it can be optimized (note: axpy is for vectors, not strided matrix)
- It's a common case.

This operation is already **_available efficiently**_ using `.zip_mut_with`:

`a1.zip_mut_with(&a2, |x, &y| *x += k * y)`

The question is if this is so common and the `zip_mut_with` so ugly that it warrants its own method.