Skip to content

Add bias support to QQLinear#3215

Open
mdepree wants to merge 2 commits intoml-explore:mainfrom
mdepree:qqlinear-bias
Open

Add bias support to QQLinear#3215
mdepree wants to merge 2 commits intoml-explore:mainfrom
mdepree:qqlinear-bias

Conversation

@mdepree
Copy link

@mdepree mdepree commented Mar 6, 2026

Proposed changes

Currently, QQLinear does not support bias terms, which prevents quantization of models that use biased linear layers. This limitation is noted in the existing docstring: "Note: This layer does not support a bias term yet." This PR adds support for an optional bias term to the QQLinear layer, bringing it in line with Linear and QuantizedLinear.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@angeloskath angeloskath requested a review from nastya236 March 7, 2026 00:59
Copy link
Collaborator

@nastya236 nastya236 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Unfortunately, the PR is incomplete. Adding a bias requires changes to how layouts are initialized for QQ matmul in mlx/backend/cuda/quantized/cublas_qqmm.h. To support bias, the implementation needs to set epilogue. This should be implemented similarly to how it is handled in regular GEMM and Matmul::eval_gpu().
Happy to iterate on this with you.

@mdepree
Copy link
Author

mdepree commented Mar 7, 2026

Hey @nastya236, I noticed CublasQQMM already inherits from CublasMatmulBase, which has the same set_bias() method that regular GEMM uses in matmul.cpp to configure the epilogue.

I updated qqmm_impl() to accept an optional bias parameter and call set_bias() when one is provided. I also left a comment about not needing an explicit type check. Does this align with what you're thinking?

@mdepree
Copy link
Author

mdepree commented Mar 7, 2026

I also noticed an opportunity for a small optimization in the QQLinear layer.

Currently bias is added separately:

x = mx.qqmm(x, weight, scales, ...)
x = x + self["bias"]

We could add an optional bias parameter to qqmm(), then update QQMatmul::eval_gpu() to extract bias from inputs and pass it to qqmm_impl()

x = mx.qqmm(x, weight, scales, ..., bias=self.get("bias"))

Let me know if this is something worth pursuing! Happy to tackle it in a follow-up PR

@nastya236
Copy link
Collaborator

nastya236 commented Mar 8, 2026

Thanks for the update. I think the implementation is incomplete as-is. The changes to qqmm_impl to support set_bias look reasonable, but there's no way for a bias to actually reach that code path.

Let's look at how it is implemented in regular matmul as an example of correct logic:

  1. nn.Linear calls mx.addmm(self["bias"], x, self["weight"].T) -- a dedicated op that accepts bias as an input.
  2. The AddMM primitive passes all three arrays (bias, a, b) to eval_gpu.
  3. eval_gpu calls gemm_and_bias, which does gemm.set_bias(encoder, *bias).

In the current implementation, bias is never passed. It will always be a null pointer. If bias is added to x after the matmul (as it is done currently), addition will be called and epilogue won't be used, which is wrong: bias should not be added to the output as x = x + bias.

The question that should be raised is whether we want to have a similar operation (like mx.addmm) for bias handling and accumulation (something like mx.qqaddmm but I am not sure regarding naming), or whether we should pass bias to qqmm directly (as you noted above). Probably, new op is a better choice because 1) the qqmm API already seems heavy, 2) it would be consistent with gemm.

@mdepree
Copy link
Author

mdepree commented Mar 9, 2026

Ah I see my error now. A new op makes sense for the reasons you've mentioned! I'm afk for the week but happy to jump back in when back. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants