Skip to content

Conversation

@ada-ggf25
Copy link

@ada-ggf25 ada-ggf25 commented Nov 29, 2025

Fix: Respect inference_mode when setting adapters with modules_to_save

Fixes #2928

Description

This PR fixes issue #2928 where modules_to_save had requires_grad=True even when inference_mode=True was passed to set_adapter() caused issues when using quantized models (e.g., with bitsandbytes) in inference mode, as the quantized layers require parameters to have requires_grad=False.

Problem

When calling model.set_adapter(adapter_name, inference_mode=True) with a model that has modules_to_save configured (e.g., a classifichead), the modules_to_save parameters would still have requires_grad=True despite being in inference mode. This happened because:

  1. _set_adapter() was calling module.enable_adapters(True) unconditionally
  2. ModulesToSaveWrapper.enable_adapters(True) sets requires_grad_(True) for all active adapters
  3. This occurred before set_adapter() was called with inference_mode, creating a conflict

Solution

The fix ensures that enable_adapters() is only called when not in inference mode:

  1. Modified _set_adapter() in src/peft/utils/other.py:

    • Added conditional check: only call enable_adapters(True) when inference_mode=False
    • This prevents setting gradients to True when inference mode should keep them False
  2. Updated PeftModel.set_adapter() in src/peft/peft_model.py:

    • Added inference_mode parameter to match the API of PeftMixedModel.set_adapter()
    • Passes inference_mode to both _set_adapter() and base_model.set_adapter()
  3. Added comprehensive tests in tests/test_other.py:

    • test_modules_to_save_inference_mode_requires_grad_false: Verifies requires_grad=False in inference mode
    • test_modules_to_save_training_mode_requires_grad_true: Verifies requires_grad=True in training mode
    • test_modules_to_save_inference_mode_with_torch_inference_mode: Verifies compatibility with torch.inference_mode()

Changes Made

Code Changes

src/peft/utils/other.py:

  • Modified _set_adapter() to conditionally call enable_adapters() based on inference_mode parameter

src/peft/peft_model.py:

  • Updated set_adapter() method signature to accept inference_mode parameter
  • Passes inference_mode to underlying adapter setting functions

tests/test_other.py:

  • Added TestModulesToSaveInferenceMode test class with 3 comprehensive tests

Testing

Test Results

All new tests pass:

  • test_modules_to_save_inference_mode_requires_grad_false - PASSED
  • test_modules_to_save_training_mode_requires_grad_true - PASSED
  • test_modules_to_save_inference_mode_with_torch_inference_mode - PASSED

All existing modules_to_save tests pass (11/11)

Related tests pass (71/74 - 3 failures are unrelated BOFT dependency issues)

Code quality checks pass (make quality)

Test Coverage

The new tests verify:

  • modules_to_save parameters correctly have requires_grad=False when inference_mode=True
  • modules_to_save parameters correctly have requires_grad=True when inference_mode=False (training mode)
  • Compatibility with torch.inference_mode() context manager

Example Usage

Before this fix, the following code would fail with quantized models:

model = PeftModel.from_pretrained(base_model, adapter_path)
model = convert_to_int8(model)  # Quantization
model.eval()

with torch.inference_mode():
    model.set_adapter("my_adapter", inference_mode=True)  #  modules_to_save still had requires_grad=True
    _ = model(batch)

After this fix:

model = PeftModel.from_pretrained(base_model, adapter_path)
model = convert_to_int8(model)  # Quantization
model.eval()

with torch.inference_mode():
    model.set_adapter("my_adapter", inference_mode=True)  #  modules_to_save correctly have requires_grad=False
    _ = model(batch)

Add optional inference_mode parameter to PeftModel.set_adapter() method
to allow setting adapters in frozen state (requires_grad=False) directly
without manual parameter manipulation.

Changes:
- Add inference_mode parameter with default value False to maintain
  backwards compatibility
- Update method docstring to document the new parameter and clarify
  that adapters are set to trainable unless inference_mode is True
- Remove manual example code snippet showing how to set requires_grad=False
- Pass inference_mode parameter to base_model.set_adapter() and
  _set_adapter() helper function calls

This enhancement simplifies the workflow for users who want to set
adapters in inference mode, addressing the need to manually manipulate
requires_grad flags after setting an adapter.
…n _set_adapter

Fix bug where enable_adapters() was called unconditionally in _set_adapter()
function, causing adapters to be incorrectly enabled even when inference_mode=True.

Changes:
- Conditionally call enable_adapters(True) only when inference_mode is False
  when adapter is found in module._adapters
- Conditionally call enable_adapters(False) only when inference_mode is False
  when adapter is not found in module._adapters
- Ensure that when inference_mode=True, adapters remain in their current
  enabled/disabled state and are not forcibly toggled

This fix ensures that the inference_mode parameter is properly respected
throughout the adapter setting process, preventing unintended adapter
activation during inference operations.
Add comprehensive test suite to validate that modules_to_save correctly
respect the inference_mode parameter when set_adapter is called.

This test class addresses issue huggingface#2928 where modules_to_save had
requires_grad=True even when inference_mode=True was passed to set_adapter.

Test coverage:
- test_modules_to_save_inference_mode_requires_grad_false: Verifies that
  modules_to_save parameters have requires_grad=False when inference_mode=True
  is passed to set_adapter, ensuring parameters are frozen during inference
- test_modules_to_save_training_mode_requires_grad_true: Verifies that
  modules_to_save parameters have requires_grad=True when inference_mode=False
  is passed to set_adapter, ensuring parameters are trainable during training
- test_modules_to_save_inference_mode_with_torch_inference_mode: Validates
  that modules_to_save work correctly when used with torch.inference_mode()
  context manager and that forward passes still function correctly

All tests use AutoModelForSequenceClassification with LoRA configuration
targeting query and value modules, with classifier as modules_to_save to
provide realistic test scenarios.
Reformat the docstring comment in TestModulesToSaveInferenceMode class
to fit within line length limits by combining two lines into a single line.

This is a minor formatting change to improve code readability and
compliance with project style guidelines.
… in ModulesToSaveWrapper

Remove the lines that set requires_grad on original_module in
ModulesToSaveWrapper.enable_adapters() method. This change addresses
the maintainer's feedback that there is no reason to touch the
requires_grad of the original_module here, and it conflicts with
bitsandbytes quantization which requires gradients to be False at all
times.

The original_module's requires_grad is no longer manipulated by
enable_adapters(), only modules_to_save gradients are managed.

Updated test_requires_grad_modules_to_save_disabling to reflect this
change by removing expectations about original_module having gradients
when adapters are disabled.

Related to issue huggingface#2928 and PR huggingface#2931.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inference mode with Module_to_save LoRA

1 participant