Conversation
There was a problem hiding this comment.
Thanks for adding this. Unfortunately, the PR is incomplete. Adding a bias requires changes to how layouts are initialized for QQ matmul in mlx/backend/cuda/quantized/cublas_qqmm.h. To support bias, the implementation needs to set epilogue. This should be implemented similarly to how it is handled in regular GEMM and Matmul::eval_gpu().
Happy to iterate on this with you.
|
Hey @nastya236, I noticed CublasQQMM already inherits from CublasMatmulBase, which has the same set_bias() method that regular GEMM uses in matmul.cpp to configure the epilogue. I updated qqmm_impl() to accept an optional bias parameter and call set_bias() when one is provided. I also left a comment about not needing an explicit type check. Does this align with what you're thinking? |
|
I also noticed an opportunity for a small optimization in the QQLinear layer. Currently bias is added separately: We could add an optional Let me know if this is something worth pursuing! Happy to tackle it in a follow-up PR |
|
Thanks for the update. I think the implementation is incomplete as-is. The changes to Let's look at how it is implemented in regular matmul as an example of correct logic:
In the current implementation, bias is never passed. It will always be a null pointer. If bias is added to The question that should be raised is whether we want to have a similar operation (like |
|
Ah I see my error now. A new op makes sense for the reasons you've mentioned! I'm afk for the week but happy to jump back in when back. Thanks! |
Proposed changes
Currently, QQLinear does not support bias terms, which prevents quantization of models that use biased linear layers. This limitation is noted in the existing docstring: "Note: This layer does not support a bias term yet." This PR adds support for an optional bias term to the QQLinear layer, bringing it in line with Linear and QuantizedLinear.
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes