Releases: huggingface/peft
v0.19.1
v0.19.0
Highlights
This PEFT release contains no less than nine new PEFT methods, described below. It also contains numerous enhancements that should make PEFT more useful to many users.
New Methods
GraLoRA
@yeonjoon-jung01 added "GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning" to PEFT (#2851). This method subdivides the base weight into smaller blocks and applies LoRA to those. This more granular adaptation promises to increase expressiveness and improve performance, especially at higher ranks (64+), closing the gap to full fine-tuning.
BD-LoRA
@Conzel contributed BD-LoRA: "Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving" (#2895). With BD-LoRA, the LoRA weights are implemented in a block-diagonal way. This allows to reduce communication overhead when using tensor parallelism (TP) and thus faster serving.
There is an experiment branch for BD-LoRA support in vLLM: vllm-project/vllm#28136.
Cartridges
Thanks to @kashif, PEFT now also supports Cartridges (#2953). The main purpose of this method is to train a prefix to compress a long context to a short size and thus save on tokens. On a low level, this is similar to prefix tuning. The PR also added an example recipe to quickly get started.
PVeRA
"PVeRA: Probabilistic Vector-Based Random Matrix Adaptation" was added to PEFT by @leofillioux in #2952. It is an extension of VeRA, a PEFT method that uses weight sharing between layers to be especially parameter efficient. PVeRA builds on top of that by adding a probabilistic element, sampling from the shared parameters and promising better performance overall.
PSOFT
@fei407 added PSOFT, "Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation", to PEFT in #3037. Orthogonal fine-tuning techniques like OFT and BOFT are good at preserving the structure and thus capabilities of the underlying base model. PSOFT improves efficiency of this technique by constraining the adaptation to low-rank principal subspace.
Lily
@yibozhong added Lily: "Low-Rank Interconnected Adaptation across Layers" to PEFT in #2563. Lily is on the surface similar to LoRA but has a sophisticated parameter sharing scheme. The A parameters are shared blockwise (e.g. 4 consecutive q_proj layers share the same A). There is a pool of B parameters that is shared globally, the actual B's are chosen in a data-dependent way through a router. This allows Lily to use higher ranks than LoRA while maintaining a low trainable parameter count.
PEANuT
In #3084, "PEANuT: Parameter-Efficient Adaptation with Weight-aware Neural Tweakers" was added to PEFT, again by @yibozhong. PEANuT adds a small, neural net (so called weight-aware neural tweakers) to the base model. Compared to LoRA, this increases expressivity for the same trainable parameter count or allows to greatly lower the parameter count without sacrificing expressivity. This comes at the expensive of a higher memory requirement for the same parameter count and decreased speed.
TinyLoRA
We have another serial contributor in @kashif, who also contributed TinyLoRA: "Learning to Reason in 13 Parameters" in #3024. This is a PEFT method that allows to train an extremely small number of parameters, much lower than what could be achieved even with LoRA rank 1. The paper shows that in particular with reinforcement learning, it can often be enough to train just a few parameters to achieve good results.
AdaMSS
@LonglongaaaGo added "AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning" to PEFT. This method segments the base weights of the model into smaller subspaces that are targeted for fine-tuning. Moreover, it's possible to dynamically assign a lower parameter budget to less important subspaces during training, similar to what AdaLoRA does. This promises to provide higher expressiveness and better generalization than similar PEFT methods.
Enhancements
Convert non-LoRA adapters to LoRA
In #2939, we added functions to PEFT to allow converting checkpoints of many non-LoRA methods into LoRA checkpoints. This can be useful because many other packages support only LoRA but not other PEFT methods, e.g. Diffusers and vLLM. With the new conversions tools, more PEFT methods than just LoRA can thus be used with those packages. Conversion is lossy but empirical testing showed that with a sufficiently high LoRA rank, the error can be quite low.
LoRA-GA
@sambhavnoobcoder added a new way to initialize LoRA weights with "LoRA-GA: Low-Rank Adaptation with Gradient Approximation" (#2926). This allows you to initialize the LoRA weights in a way that aligns the gradients with full fine-tuning and should lead to faster training convergence.
Reducing intruder dimensions
In "LoRA vs Full Fine-tuning: An Illusion of Equivalence", the authors showed that LoRA fine-tuning can introduce so-called "intruder dimensions" which contribute to forgetting. We now have a utility function to remove intruder dimension in PEFT, reduce_intruder_dimension. When calling this on a fine-tuned LoRA model, forgetting should be reduced while the fine-tuned task performance should remain almost the same.
Transformer Engine
In #3048, @balvisio added support for Transformer Engine, a quantization method by NVIDIA, to PEFT.
Tensor Parallel Support
In a series of PRs (#3079, #3091, #3096), @michaelbenayoun added support for Tensor Parallelism to LoRA.
Weight tying improvements
In many LLMs, the embedding and the LM head have tied weights to save on parameter count. This can, however, lead to tricky situations when trying to fine-tune those layers. Through a series of PRs (#2803, #2922, #2870, #2879, #3126), we improved the user experience when doing so. Most notably, users can now pass ensure_weight_tying=True to their PEFT config to force weight tying to be upheld. Please check the PEFT weight tying docs for how weight tying is now being handled. Thanks to @romitjain, @sambhavnoobcoder, and @Cursx for their contributions.
Low precsion floating type support
#3055 makes LoRA work with base models that use very low precision floats like torch.float8_e4m3fn. An example of that would be MiniMax-M2.5.
Zero init for PrefixTuning
#3128 introduces zero init to Prefix Tuning which, according to our benchmarks, reduced the result variance significantly and yielded good task accuracy without the need for prompt engineering.
LoftQ + int8 quantization
With #3088 the LoftQ implementation now supports correcting errors for int8 quantization without utilizing activation thresholding alongside the already existing nf4 quantization.
Changes
Removal of Bone
The Bone PEFT method was removed in #3115. Users are directed to use MiSS instead, which is the improved replacement for Bone. Use this Bone-to-MiSS conversion script if you want to port old Bone checkpoints.
AutoGPTQ and AutoAWQ
These two quantization methods now use GPTQModel as their backend (#2932) thanks to @ZX-ModelCloud.
Handling of requires_grad in modules_to_save
Previously, PEFT would enable requires_grad on the original module if the corresponding modules_to_save was disabled. This is almost never desirable and was thus fixed. Although this change is technically backwards-incompatible, it's an extreme niche case, so we don't expect any user to be negatively affected by it.
All Changes
- FIX SFT example (8bit quant, trl) by @BenjaminBossan in #2857
- TST Add GPU training tests for p-tuning & prefix tuning by @BenjaminBossan in #2844
- CHORE: Bump Python version in pyproject.toml by @BenjaminBossan in #2865
- MNT: Clean up unused method set_auxiliary_adapters by @BenjaminBossan in #2876
- ENH: Improve MetaMath training script runtime by @BenjaminBossan in #2894
- CI: Install fbgemm package needed by torchao, update test by @benjaminb...
0.18.1
Small patch release containing the following changes:
- #2934: Small fixes required for some special cases to work with the upcoming transformers v5 release
- #2963: Fix to enable PEFT to run with AMD ROCm thanks to @vladmandic
- #2976: Fix a regression that inadvertently required transformers >= 4.52
0.18.0: RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and more
Highlights
FIXME update list of all changes, so some more commits were added
New Methods
RoAd
@ppetrushkov added RoAd: 2D Rotary Adaptation to PEFT in #2678. RoAd learns 2D rotation matrices that are applied using only element-wise multiplication, thus promising very fast inference with adapters in unmerged state.
Remarkably, besides LoRA, RoAd is the only PEFT method that supports mixed adapter batches. This means that when you have loaded a model with multiple RoAd adapters, you can use all of them for different samples in the same batch, which is much more efficient than switching adapters between batches:
model = PeftModel.from_pretrained(base_model, <path-to-road-adapter-A>, adapter_name="adapter-A")
model.add_adapter("adapter-B", <path-to-road-adapter-B>)
inputs = ... # input with 3 samples
# apply adapter A to sample 0, adapter B to sample 1, and use the base model for sample 2:
adapter_names = ["adapter-A", "adapter-B", "__base__"]
output_mixed = model(**inputs, adapter_names=adapter_names)
gen_mixed = model.generate(**inputs, adapter_names=adapter_names)ALoRA
Activated LoRA is a technique added by @kgreenewald in #2609 for causal language models, allowing to selectively enable LoRA adapters depending on a specific token invocation sequence in the input. This has the major benefit of being able to re-use most of the KV cache during inference when the adapter is only used to generate part of the response, after which the base model takes over again.
Arrow & GenKnowSub
@TheTahaaa contributed not only support for Arrow, a dynamic routing algorithm between multiple loaded LoRAs in #2644, but also GenKnowSub, a technique built upon Arrow where the 'library' of LoRAs available to Arrow is first modified by subtracting general knowledge adapters (e.g., trained on subsets of Wikipedia) to enhance task-specific performance.
WaveFT
Thanks to @Bilican, Wavelet Fine-Tuning (WaveFT) was added to PEFT in #2560. This method trains sparse updates in the wavelet domain of residual matrices, which is especially parameter efficient. It is very interesting for image generation, as it promises to generate diverse outputs while preserving subject fidelity.
DeLoRA
Decoupled Low-rank Adaptation (DeLoRA) was added by @mwbini in #2780. This new PEFT method is similar to DoRA in so far as it decouples the angle and magnitude of the learned adapter weights. However, DeLoRA implements this in a way that promises to better prevent divergence. Moreover, it constrains the deviation of the learned weight by imposing an upper limit of the norm, which can be adjusted via the delora_lambda parameter.
OSF
Orthogonal Fine-Tuning (OSF) was added by @NikhilNayak-debug in #2685. By freezing the high-rank subspace of the targeted weight matrices and projecting gradient updates to a low-rank subspace, OSF achieves good performance on continual learning tasks. While it is a bit memory intensive for standard fine-tuning processes, it is definitely worth checking out on tasks where performance degradation of previously learned tasks is a concern.
Enhancements
Text generation benchmark
In #2525, @ved1beta added the text generation benchmark to PEFT. This is a framework to determine and compare metrics with regard to text generation of different PEFT methods, e.g. runtime and memory usage. Right now, this benchmark is still lacking experimental settings and a visualization, analogous to what we have in the MetaMathQA benchmark. If this is something that interests you, we encourage you to let us know or, even better, contribute to this benchmark.
Reliable interface for integrations
PEFT has integrations with other libraries like Transformers and Diffusers. To facilitate this integration, PEFT now provides a stable interface of functions that should be used if applicable. For example, the set_adapter function can be used to switch between PEFT adapters on the model, even if the model is not a PeftModel instance. We commit to keeping these functions backwards compatible, so it's safe for other libraries to build on top of those.
Handling of weight tying
Some Transformers models can have tied weights. This is especially prevalent when it comes to the embedding and the LM head. Currently, the way that this is handled in PEFT is not obvious. We thus drafted an issue to illustrate the intended behavior in #2864. This shows what our goal is, although not everything is implemented yet.
In #2803, @romitjain added the ensure_weight_tying argument to LoraConfig. This argument, if set to True, enforces weight tying of the modules targeted with modules_to_save. Thus, if embedding and LM head are tied, they will share weights, which is important to allow, for instance, weight merging. Therefore, for most users, we recommend to enable this setting if they want to fully fine-tune the embedding and LM head. For backwards compatability, the setting is off by default though.
Note that in accordance with #2864, the functionality of ensure_weight_tying=True will be expanded to also include trainable tokens (#2870) and LoRA (tbd.) in the future.
Support Conv1d and 1x1 Conv2 layers in LoHa and LoKr
@grewalsk extended LoHa and LoKr to support nn.Conv1d layers, as well as nn.Conv2d with 1x1 kernels, in #2515.
New prompt tuning initialization
Thanks to @macmacmacmac, we now have a new initialization option for prompt tuning, random discrete initialization (#2815). This option should generally work better than random initialization, as corroborated on our PEFT method comparison suite. Give it a try if you use prompt tuning.
Combining LoRA adapters with negative weights
If you use multiple LoRA adapters, you can merge them into a single adapter using model.add_weighted_adapter. However, so far, this only worked with positive weights per adapter. Thanks to @sambhavnoobcoder and @valteu, it is now possible to pass negative weights too.
Changes
Transformers compatibility
At the time of writing, the Transformers v5 release is imminent. This Transformers version will be incomptabile with PEFT < 0.18.0. If you plan to use Transformers v5 with PEFT, please upgrade PEFT to 0.18.0+.
Python version
This PEFT version no longer supports Python 3.9, which has reached its end of life. Please use Python 3.10+.
Updates to OFT
The OFT method has been updated to make it slightly faster and to stabilize the numerics in #2805. This means, however, that existing checkpoints may give slightly different results after upgrading to PEFT 0.18.0. Therefore, if you use OFT, we recommend to retrain the adapter.
All Changes
- add xpu support for boft/controlnet example by @kaixuanliu in #2674
- enabe boft_dreambooth on XPU by @yao-matrix in #2679
- Add XPU support for dna_language_model example by @kaixuanliu in #2689
- validated lora dreambooth on xpu, pass by @yao-matrix in #2696
- validated lorafa on xpu, passed by @yao-matrix in #2697
- enable corda finetuning on xpu by @yao-matrix in #2687
- validated cpt, ephemeral_gpu_offloading and eva finetuning on XPU by @yao-matrix in #2694
- validated PISSA on xpu, pass by @yao-matrix in #2703
- validated MISS on xpu, pass by @yao-matrix in #2704
- fix bug for feature_extraction example by @kaixuanliu in #2706
- Use
hub_online_oncein trainable token tests by @githubnemo in #2701 - Bump version to 0.17.1.dev0 after release by @BenjaminBossan in #2707
- validated multi_adapter on xpu, pass by @yao-matrix in #2711
- verified mlp on xpu, pass by @yao-matrix in #2712
- use CPU instead of XPU for face_alignment by @kaixuanliu in #2713
- Add conditional_generation example xpu support by @kaixuanliu in #2684
- validated POLY on XPU, pass by @yao-matrix in #2702
- add XPU support for hra_dreambooth example by @kaixuanliu in #2717
- enable xpu device for causal_language_modeling example by @kaixuanliu in #2680
- add xpu support for fp4_finetuing example by @kaixuanliu in #2714
...
0.17.1
This patch release contains a few fixes (via #2710) for the newly introduced target_parameters feature, which allows LoRA to target nn.Parameters directly (useful for mixture of expert layers). Most notably:
- PEFT no longer removes possibly existing parametrizations from the parameter.
- Adding multiple adapters (via
model.add_adapterormodel.load_adapter) did not work correctly. Since a solution is not trivial, PEFT now raises an error to prevent this situation.
0.17.0: SHiRA, MiSS, LoRA for MoE, and more
Highlights
New Methods
SHiRA
@kkb-code contributed Sparse High Rank Adapters (SHiRA, paper) which promise to offer a potential gain in performance over LoRAs - especially the concept loss when using multiple adapters is improved. Since the adapters only train on 1-2% of the weights and are inherently sparse, switching between adapters may be cheaper than with LoRAs. (#2584)
MiSS
@JL-er added a new PEFT method, MiSS (Matrix Shard Sharing) in #2604. This method is an evolution of Bone, which, according to our PEFT method comparison benchmark, gives excellent results when it comes to performance and memory efficiency. If you haven't tried it, you should do so now.
At the same time, Bone will be deprecated in favor of MiSS and will be removed in PEFT v0.19.0. If you already have a Bone checkpoint, you can use scripts/convert-bone-to-miss.py to convert it into a MiSS checkpoint and proceed with training using MiSS.
Enhancements
LoRA for nn.Parameter
LoRA is now able to target nn.Parameter directly (#2638, #2665)! Ever had this complicated nn.Module with promising parameters inside but it was too custom to be supported by your favorite fine-tuning library? No worries, now you can target nn.Parameters directly using the target_parameters config attribute which works similarly to target_modules.
This option can be especially useful for models with Mixture of Expert (MoE) layers, as those often use nn.Parameters directly and cannot be targeted with target_modules. For example, for the Llama4 family of models, use the following config to target the MoE weights:
config = LoraConfig(
...,
target_modules=[], # <= prevent targeting any modules
target_parameters=["feed_forward.experts.down_proj", "feed_forward.experts.gate_up_proj"],
)Note that this feature is still experimental as it comes with a few caveats and therefore might change in the future. Also, MoE weights with many experts can be quite huge, so expect a higher memory usage than compared to targeting normal nn.Linear layers.
Injecting adapters based on a state_dict
Sometimes, it is possible that there is a PEFT adapter checkpoint but the corresponding PEFT config is not known for whatever reason. To inject the PEFT layers for this checkpoint, you would usually have to reverse-engineer the corresponding PEFT config, most notably the target_modules argument, based on the state_dict from the checkpoint. This can be cumbersome and error prone. To avoid this, it is also possible to call inject_adapter_in_model and pass the loaded state_dict as an argument:
from safetensors.torch import load_file
from peft import LoraConfig, inject_adapter_in_model
model = ...
state_dict = load_file(<path-to-safetensors-file>)
lora_config = LoraConfig() # <= no need to specify further
model = inject_adapter_in_model(lora_config, model, state_dict=state_dict)Find more on state_dict based injection in the docs.
Changes
Compatibility
A bug in prompt learning methods caused modules_to_save to be ignored. Especially classification tasks are affected since they usually add the classification/score layer to modules_to_save. In consequence, these layers were neither trained nor stored after training. This has been corrected now. (#2646)
All Changes
- Bump version to 0.16.1.dev0 after release by @BenjaminBossan in #2632
- FEAT: Add GH action to deploy method comparison app by @BenjaminBossan in #2625
- enable FSDP example for model `hugging-quants/Meta-Llama-3.1-8B-Instr… by @kaixuanliu in #2626
- FIX: Create mask function signature change in transformers 4.53.1 by @BenjaminBossan in #2633
- FIX: Correctly skip AWQ test based on torch version by @BenjaminBossan in #2631
- FIX: Faulty OFT parameter device test by @BenjaminBossan in #2630
- Fix #2634: Allow peft_type to be a string by @githubnemo in #2635
- SHiRA Adapters by @kkb-code in #2584
- FIX: Prompt learning methods modules_to_save issue by @BenjaminBossan in #2646
- FIX: Error in workflow file to deploy method comparison app by @BenjaminBossan in #2645
- FEAT Allow LoRA to target nn.Parameter by @BenjaminBossan in #2638
- Update BibTeX entry by @cx-alberto-simoes in #2659
- FIX Prefix tuning after transformers PR 38635 by @BenjaminBossan in #2662
- make method comparison device agnostic, so it can expand to more accelerators like XPU by @yao-matrix in #2610
- Update tokenizer parameter in sfttrainer across multiple examples by @gapsong in #2664
- Update lora.md by @qgallouedec in #2666
- GPT2 compatible version of LLama-Adapters by @efraimdahl in #2643
- Method Comparison: Improve formatting/layout of table by @githubnemo in #2670
- ENH: Targeting multiple parameters on the same module by @BenjaminBossan in #2665
- Update extending vocab docs by @githubnemo in #2669
- FIX Failing target_parameters param usage count by @BenjaminBossan in #2676
- Fix trainable tokens with fsdp by @BenjaminBossan in #2681
- FIX: Small fixes to target_parameters by @BenjaminBossan in #2677
- TST: Add more HF Hub model caching by @BenjaminBossan in #2682
- FIX: Missing device map for facebook/opt-125m by @BenjaminBossan in #2675
- Fix not detecting regex-targeted embedding layer by @githubnemo in #2649
- Add MiSS as a replacement for Bone. by @JL-er in #2604
- [WIP] ENH: Adapter injection based on state_dict by @BenjaminBossan in #2637
- Release 0.17.0 by @BenjaminBossan in #2691
New Contributors
- @kaixuanliu made their first contribution in #2626
- @kkb-code made their first contribution in #2584
- @cx-alberto-simoes made their first contribution in #2659
- @efraimdahl made their first contribution in #2643
Full Changelog: v0.16.0...v0.17.0
0.16.0: LoRA-FA, RandLoRA, C³A, and much more
Highlights
New Methods
LoRA-FA
In #2468, @AaronZLT added the LoRA-FA optimizer to PEFT. This optimizer is based on AdamW and it increases memory efficiency of LoRA training. This means that you can train LoRA with less memory, or, with the same memory budget, use higher LoRA ranks, potentially getting better results.
RandLoRA
Thanks to @PaulAlbert31, a new PEFT method called RandLoRA was added to PEFT (#2464). Similarly to VeRA, it uses non-learnable random low rank matrices that are combined through learnable matrices. This way, RandLoRA can approximate full rank updates of the weights. Training models quantized with bitsandbytes is supported.
C³A
@Phoveran added Circular Convolution Adaptation, C3A, in #2577. This new PEFT method can overcome the limit of low rank adaptations as seen e.g. in LoRA while still promising to be fast and memory efficient.
Enhancements
Thanks to @gslama12 and @SP1029, LoRA now supports Conv2d layers with groups != 1. This requires the rank r being divisible by groups. See #2403 and #2567 for context.
@dsocek added support for Intel Neural Compressor (INC) quantization to LoRA in #2499.
DoRA now supports Conv1d layers thanks to @EskildAndersen (#2531).
Passing init_lora_weights="orthogonal" now enables orthogonal weight initialization for LoRA (#2498).
@gapsong brought us Quantization-Aware LoRA training in #2571. This can make QLoRA training more efficient, please check the included example. Right now, only GPTQ is supported.
There has been a big refactor of Orthogonal Finetuning, OFT, thanks to @zqiu24 (#2575). This makes the PEFT method run more quickly and require less memory. It is, however, incompatible with old OFT checkpoints. If you have old OFT checkpoints, either pin the PEFT version to <0.16.0 or retrain it with the new PEFT version.
Thanks to @keepdying, LoRA hotswapping with compiled models no longer leads to CUDA graph re-records (#2611).
Changes
Compatibility
- #2481: The value of
required_grads_ofmodules_to_saveis now set toTruewhen used directly withinject_adapter. This is relevant for PEFT integrations, e.g. Transformers or Diffusers. - Due to a big refactor of vision language models (VLMs) in Transformers, the model architecture has been slightly adjusted. One consequence of this is that if you use a PEFT prompt learning method that is applied to
vlm.language_model, it will no longer work, please apply it tovlmdirectly (see #2554 for context). Morever, the refactor results in different checkpoints. We managed to ensure backwards compatability in PEFT, i.e. old checkpoints can be loaded successfully. There is, however, no forward compatibility, i.e. loading checkpoints trained after the refactor is not possible with package versions from before the refactor. In this case, you need to upgrade PEFT and transformers. More context in #2574. - #2579: There have been bigger refactors in Transformers concerning attention masks. This required some changes on the PEFT side which can affect prompt learning methods. For prefix tuning specifically, this can result in numerical differences but overall performance should be the same. For other prompt learning methods, numerical values should be the same, except if the base model uses 4d attention masks, like Gemma. If you load old prompt learning checkpoints, please double-check that they still perform as expected, especially if they're trained on Gemma or similar models. If not, please re-train them or pin PEFT and transformers to previous versions (
<0.16.0and<4.52.0, respectively).
All Changes
- Bump version and minor instruction fix by @githubnemo in #2439
- FIX for ConvNd layers using the groups argument. by @gslama12 in #2403
- DOC: Tip on how to merge with DeepSpeed by @BenjaminBossan in #2446
- Fix incorrect link in docs by @kenning in #2444
- Fix typos by @omahs in #2447
- Refactor to better support LoRA variants by @BenjaminBossan in #2443
- enable 5 test cases on XPU by @yao-matrix in #2442
- FIX: Faulty test that results in nan weights by @BenjaminBossan in #2448
- Fix sft example script trl and env var by @BenjaminBossan in #2454
- LoRA variant init now also receives kwargs by @BenjaminBossan in #2455
- Fix #2450: Revamp adapter_state_dict_* methods by @githubnemo in #2456
- Method comparison evaluation suite by @githubnemo in #2395
- Bump version to reflect patch release by @githubnemo in #2461
- The paper on the Bone structure has been updated by @JL-er in #2312
- CI: More caching in tests by @BenjaminBossan in #2472
- fix gpu tests by @jiqing-feng in #2471
- Fix compare results by @jiqing-feng in #2473
- fix error_factor for xpu by @jiqing-feng in #2475
- Fix: Multiple PEFT methods have issues with models loaded in float16 or bfloat16 by @BenjaminBossan in #2433
- TST Refactor tests to make them simpler by @BenjaminBossan in #2462
- Use Python 3.9 as RUFF target version and apply fixes by @cyyever in #2483
- FIX Deleting adapters on auxiliary modules by @BenjaminBossan in #2466
- fix args by @real-zhangzhe in #2474
- ENH Add default target_modules for Llama4 by @BenjaminBossan in #2480
- [Feature Request] Add LoRA-FA to PEFT by @AaronZLT in #2468
- TST Refactor (continued) of encoder tests by @BenjaminBossan in #2478
- FIX: Error when merging LoRA bias with scale != 1 by @BenjaminBossan in #2489
- FIX: X-LoRA error when targeting different modules by @BenjaminBossan in #2488
- Fix: the evaluation_strategy is deprecated by @yuanwu2017 in #2487
- Testing common uses situational HF_HUB_OFFLINE by @githubnemo in #2490
- MNT: Update HF Hub download kwargs by @BenjaminBossan in #2492
- FIX Multi GPU tests: explicit device map by @BenjaminBossan in #2484
- Fix #2477: Regression accessing
modules_to_saveby @githubnemo in #2481 - make test_lora_use_dora_linear pass on XPU by @yao-matrix in #2493
- TST: AQLM test no longer x-fails by @BenjaminBossan in #2506
- TST make 3 flaky test cases always pass on XPU by @yao-matrix in #2503
- FIX: CPT should not be tested with sequence classification by @BenjaminBossan in #2507
- Update Docker image builds for torch 2.7+cu126 by @matthewdouglas in #2514
- Feature: RandLora integration into peft by @PaulAlbert31 in #2464
- LORA/MODEL: Use max rank of pattern for
add_weighted_adapterby @Beinsezii in #2512 - fix typo for skipping test by @jiqing-feng in #2519
- docs typo: fix links by @imba-tjd in #2517
- Add INC dispatcher by @dsocek in #2499
- ENH: Add default Qwen3 target modules by @BenjaminBossan in #2522
- MNT: Pin GitHub action hashes for security by @BenjaminBossan in #2521
- TST: Refactor remaining common tests to use pytest by @BenjaminBossan in #2491
- ENH: Add tests, docs, types for scaling methods by @BenjaminBossan in #2526
- TST Mark AutoAWQ as xfail for now by @BenjaminBossan in #2529
- FIX Prompt learning issue with 4d attention mask by @BenjaminBossan in #2458
- FIX: Use correct argument name in MultiheadAttention forward by @BenjaminBossan in #2510
- Method comparison: Support more options for the optimizer by @BenjaminBossan in #2479
- Randlora documentation and some example usage by @PaulAlbert31 in #2524
- added support for Conv1d for DoRA by @EskildAndersen in #2531
- Fix #2535: Prev...
v0.15.2
v0.15.1
This patch includes a fix for #2450. In this bug modules_to_save was not handled correctly when used in conjunction with DeepSpeed ZeRO stage 3 which resulted in those modules being placeholder values in the saved checkpoints.
Full Changelog: v0.15.0...v0.15.1
v0.15.0
Highlights
New Methods
CorDA: Context-Oriented Decomposition Adaptation
@iboing and @5eqn contributed CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning . This task-driven initialization method has two modes, knowledge-preservation and instruction-preservation, both using external data to select ranks intelligently. The former can be used to select those ranks that correspond to weights not affiliated with knowledge from, say, a QA dataset. The latter can be used to select those ranks that correspond most to the task at hand (e.g., a classification task). (#2231)
Trainable Tokens: Selective token update
The new Trainable Tokens tuner allows for selective training of tokens without re-training the full embedding matrix, e.g. when adding support for reasoning / thinking tokens. This is a lot more memory efficient and the saved checkpoint is much smaller. It can be used standalone or in conjunction with LoRA adapters by passing trainable_token_indices to LoraConfig. (#2376)
Enhancements
LoRA now supports targeting multihead attention modules (but for now only those with _qkv_same_embed_dim=True). These modules were tricky as they may expose linear submodules but won't use their forward methods, therefore needing explicit support. (#1324)
Hotswapping now allows different alpha scalings and ranks without recompilation of the model when the model is prepared using a call to prepare_model_for_compiled_hotswap() before compiling the model. (#2177)
GPTQModel support was added in #2247 as a replacement for AutoGPTQ which is not maintained anymore.
Changes
- It's now possible to use
all-linearastarget_modulesfor custom (non-transformers) models (#2267). With this change comes a bugfix where it was possible that non-linear layers were selected when they shared the same name with a linear layer (e.g.,bar.fooandbaz.foo). - The internal tuner API was refactored to make method registration easier. With this change the number of changes to numerous files is reduced to a single
register_peft_method()call. (#2282) PEFT_TYPE_TO_MODEL_MAPPINGis now deprecated and should not be relied upon. UsePEFT_TYPE_TO_TUNER_MAPPINGinstead. (#2282)- Mixed adapter batches can now be used in conjunction with beam search. (#2287)
- It was possible that
modules_to_savekeys wrongly matched parts of the state dict if the key was a substring of another key (e.g.,classifierandclassifier2). (#2334) - Auto-casting of the input dtype to the LoRA adapter dtype can now be disabled via
disable_input_dtype_casting=True. (#2353) - The config parameters
rank_patternandalpha_patternused by many adapters now supports matching full paths as well by specifying the pattern with a caret in front, for example:^footo targetmodel.foobut notmodel.bar.foo. (#2419) - AutoPeftModels do not reduce the embedding size anymore if the tokenizer size differs from the embedding size. Only if there are more tokens in the tokenizer than in the embedding matrix, the matrix will be resized. This is to prevent resizing of embedding matrices in models that have 'spare' tokens built-in. (#2427)
What's Changed
- FIX: Ensure Device Compatibility for BOFT Forward/Merging by @d-kleine in #2242
- MNT: Bump version to 0.14.1.dev0 by @BenjaminBossan in #2263
- ENH: fix library interface by @bluenote10 in #2265
- FIX: Add warning for
adapter_nameconflict with tuner by @pzdkn in #2254 - ENH: FIX: Allow
"all-linear"to target custom models by @BenjaminBossan in #2267 - MNT: apply sorting of exported symbols in
__all__by @bluenote10 in #2280 - MNT: apply sorting of imports by @bluenote10 in #2279
- FIX: Adoption prompt: New way to obtain position embeddings by @BenjaminBossan in #2276
- FIX: Int8 check for torchao v0.7.0 by @BenjaminBossan in #2284
- FEAT: Adding CorDA as an optional initialization method of LoRA by @iboing in #2231
- FIX: typo in lora
config.pyby @innerlee in #2297 - DOC: Added information regarding freezing the base model in
prepare_model_for_kbit_trainingdocstring by @NilBiescas in #2305 - DOC: add
resize_token_embeddingsto docs by @bingwork in #2290 - FIX: Make CorDA example work by @5eqn in #2300
- FIX: #2295: Warn when user reloads modified model by @githubnemo in #2306
- ENH: Extend usage for OLoRA finetune script by @jiqing-feng in #2308
- CI: Add zizmor for CI (security) linting by @githubnemo in #2288
- FEAT: Add LoRA multihead attention module by @BenjaminBossan in #1324
- DOC: Updated documentation for
get_peft_model()for in-place base model modification by @d-kleine in #2313 - FIX: Prefix tuning test w/ rotary embedding on multi GPU by @BenjaminBossan in #2311
- FIX: Adaption prompt errors after changes from transformers #35235 by @BenjaminBossan in #2314
- FIX: Package checks for torchao, EETQ by @BenjaminBossan in #2320
- Refactor: PEFT method registration function by @BenjaminBossan in #2282
- FIX:
low_cpu_mem_usage=Truewith 8bit bitsandbytes by @BenjaminBossan in #2325 - FIX: Reinstate
PEFT_TYPE_TO_MODEL_MAPPINGvariable with deprecation by @BenjaminBossan in #2328 - FIX: reduce CorDA memory consumption + docs by @5eqn in #2324
- MNT: React on new zizmor version findings by @githubnemo in #2331
- TST: make cuda-only tests device-agnostic by @faaany in #2323
- FIX: Generating with mixed adapter batches and with beam search enabled by @BenjaminBossan in #2287
- FIX: Bug with
modules_to_saveloading if substring by @BenjaminBossan in #2334 - FIX: Add missing attributes to MultiheadAttention by @BenjaminBossan in #2335
- FIX: for zizmor permission warnings by @githubnemo in #2338
- CI: Attempt at adding a cache for models by @githubnemo in #2327
- FIX: Avoid needless copy from
modules_to_saveby @BenjaminBossan in #2220 - DOC: Add entry to solve unknown config argument by @BenjaminBossan in #2340
- FEAT: add gptqmodel support by @jiqing-feng in #2247
- MNT: Update ruff to v0.9.2 by @BenjaminBossan in #2343
- TST: Update
torch.compiletests and docs by @BenjaminBossan in #2332 - FIX: Documentation & error checking for AdaLoRA timing by @githubnemo in #2341
- DOC: Better document init_lora_weights=False option by @BenjaminBossan in #2347
- ENH: Adding Lora implementation for
nn.Conv1dby @CCLDArjun in #2333 - FIX: Failing AdaLoRA GPU test by @BenjaminBossan in #2349
- ENH: Improve invalid peft config error message by @thedebugger in #2346
- TST: Use different diffusion model for testing by @BenjaminBossan in #2345
- CI: Use locked install for zizmor by @githubnemo in #2350
- DOC: fix links to PEFT guides by @makelinux in #2357
- DOC: rename link to PEFT Quicktour by @makelinux in #2358
- ENH: Allow disabling input dtype casting for LoRA by @BenjaminBossan in #2353
- ENH: Hotswap allow different alpha scalings and ranks by @BenjaminBossan in #2177
- DOC: Fix links to boft by @makelinux in #2365
- DOC: Explain uninitialized weights warning by @BenjaminBossan in #2369
- ENH: Optimization for ConvNd if dropout=0. by @gslama12 in #2371
- FIX: Small fixes to hotswapping by @BenjaminBossan in #2366
- ENH:
prepare_model_for_compiled_hotswapraises when no adapter was found by @BenjaminBossan in https://github.com/hugging...

