Skip to content

Enable XPU for blockwise quantization and FSDP tests#1921

Open
jiqing-feng wants to merge 7 commits intobitsandbytes-foundation:mainfrom
jiqing-feng:clean_tests
Open

Enable XPU for blockwise quantization and FSDP tests#1921
jiqing-feng wants to merge 7 commits intobitsandbytes-foundation:mainfrom
jiqing-feng:clean_tests

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

Summary

Replace CUDA-only guards with device-agnostic torch.accelerator APIs so that XPU (and other accelerators) can run the blockwise quantization and FSDP state_dict tests.

Changes

tests/test_functional.py

  • Allow XPU to run the full test_dynamic_blockwise_quantization parameter matrix (nested, non-256 blocksizes, fp16/bf16) instead of skipping them as "non-CUDA".

tests/test_linear4bit.py

  • Replace torch.cuda.is_available() and NCCL-specific skip guards on test_fsdp_state_dict_save_4bit with a single torch.accelerator.is_available() check.

tests/fsdp_state_dict_save.py

  • Auto-detect the accelerator type via torch.accelerator.current_accelerator() and map it to the correct distributed backend (CUDA → nccl, XPU → xccl).
  • Replace torch.cuda.set_device() with torch.accelerator.set_device_index() and use the detected device type for model.to().

Hi @matthewdouglas . Would you please review this PR? Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as ready for review April 13, 2026 06:21
@matthewdouglas
Copy link
Copy Markdown
Member

Hi @jiqing-feng,

Using torch.accelerator is likely going to break this on older torch versions. I think it was introduced in PyTorch 2.6. This is probably a minimum requirement for XPU (if not even higher), but right now we support PyTorch 2.3+ for CUDA hardware. I will probably soon bump to 2.4, so we would want some backwards compatibility for that.

Otherwise, looks good!

@github-actions
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @matthewdouglas . I have added backwards compatibility for torch versions without torch.accelerator by falling back to torch.cuda/torch.xpu checks and device setup, while keeping the torch.accelerator path for newer versions.

@jiqing-feng
Copy link
Copy Markdown
Contributor Author

The failed tests seems like an env issue. Is this because the change of torch.cuda to torch.accelerator? @matthewdouglas , do you have any suggestions about it?

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@matthewdouglas
Copy link
Copy Markdown
Member

@jiqing-feng I think we just need to skip that test on Windows. Otherwise to have a chance at working we'd probably need to switch nccl to gloo and set USE_LIBUV=0. Even then I don't know that it would work, and it's not a use case that's particularly important. We have other tests being skipped when platform.system() == "Windows" too, so it's fine to just do that.

@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @matthewdouglas . I have skipped fsdp on Windows for all devices, please check it. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants