Add bias support to QQLinear by mdepree · Pull Request #3215 · ml-explore/mlx

mdepree · 2026-03-06T21:51:12Z

Proposed changes

Currently, QQLinear does not support bias terms, which prevents quantization of models that use biased linear layers. This limitation is noted in the existing docstring: "Note: This layer does not support a bias term yet." This PR adds support for an optional bias term to the QQLinear layer, bringing it in line with Linear and QuantizedLinear.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

nastya236

Thanks for adding this. Unfortunately, the PR is incomplete. Adding a bias requires changes to how layouts are initialized for QQ matmul in mlx/backend/cuda/quantized/cublas_qqmm.h. To support bias, the implementation needs to set epilogue. This should be implemented similarly to how it is handled in regular GEMM and Matmul::eval_gpu().
Happy to iterate on this with you.

mdepree · 2026-03-07T19:32:04Z

Hey @nastya236, I noticed CublasQQMM already inherits from CublasMatmulBase, which has the same set_bias() method that regular GEMM uses in matmul.cpp to configure the epilogue.

I updated qqmm_impl() to accept an optional bias parameter and call set_bias() when one is provided. I also left a comment about not needing an explicit type check. Does this align with what you're thinking?

mdepree · 2026-03-07T19:39:57Z

I also noticed an opportunity for a small optimization in the QQLinear layer.

Currently bias is added separately:

x = mx.qqmm(x, weight, scales, ...)
x = x + self["bias"]

We could add an optional bias parameter to qqmm(), then update QQMatmul::eval_gpu() to extract bias from inputs and pass it to qqmm_impl()

x = mx.qqmm(x, weight, scales, ..., bias=self.get("bias"))

Let me know if this is something worth pursuing! Happy to tackle it in a follow-up PR

nastya236 · 2026-03-08T00:12:24Z

Thanks for the update. I think the implementation is incomplete as-is. The changes to qqmm_impl to support set_bias look reasonable, but there's no way for a bias to actually reach that code path.

Let's look at how it is implemented in regular matmul as an example of correct logic:

nn.Linear calls mx.addmm(self["bias"], x, self["weight"].T) -- a dedicated op that accepts bias as an input.
The AddMM primitive passes all three arrays (bias, a, b) to eval_gpu.
eval_gpu calls gemm_and_bias, which does gemm.set_bias(encoder, *bias).

In the current implementation, bias is never passed. It will always be a null pointer. If bias is added to x after the matmul (as it is done currently), addition will be called and epilogue won't be used, which is wrong: bias should not be added to the output as x = x + bias.

The question that should be raised is whether we want to have a similar operation (like mx.addmm) for bias handling and accumulation (something like mx.qqaddmm but I am not sure regarding naming), or whether we should pass bias to qqmm directly (as you noted above). Probably, new op is a better choice because 1) the qqmm API already seems heavy, 2) it would be consistent with gemm.

mdepree · 2026-03-09T04:09:05Z

Ah I see my error now. A new op makes sense for the reasons you've mentioned! I'm afk for the week but happy to jump back in when back. Thanks!

Add bias support to QQLinear

023a919

angeloskath requested a review from nastya236 March 7, 2026 00:59

nastya236 requested changes Mar 7, 2026

View reviewed changes

Enable bias epilogue for QQ matmul

29bf31c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bias support to QQLinear#3215

Add bias support to QQLinear#3215
mdepree wants to merge 2 commits intoml-explore:mainfrom
mdepree:qqlinear-bias

mdepree commented Mar 6, 2026

Uh oh!

nastya236 left a comment •

edited

Loading

Uh oh!

mdepree commented Mar 7, 2026

Uh oh!

mdepree commented Mar 7, 2026

Uh oh!

nastya236 commented Mar 8, 2026 •

edited

Loading

Uh oh!

mdepree commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mdepree commented Mar 6, 2026

Proposed changes

Checklist

Uh oh!

nastya236 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdepree commented Mar 7, 2026

Uh oh!

mdepree commented Mar 7, 2026

Uh oh!

nastya236 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdepree commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nastya236 left a comment •

edited

Loading

nastya236 commented Mar 8, 2026 •

edited

Loading