-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[WIP] FEAT Add function to convert non-LoRA PEFT adapters to LoRA #2939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
BenjaminBossan
wants to merge
18
commits into
huggingface:main
Choose a base branch
from
BenjaminBossan:feat-lora-conversion
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[WIP] FEAT Add function to convert non-LoRA PEFT adapters to LoRA #2939
BenjaminBossan
wants to merge
18
commits into
huggingface:main
from
BenjaminBossan:feat-lora-conversion
+1,143
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This adds the possibility to convert a non-LoRA adapter into a LoRA adapter. Not all LoRA adapters will support this, but many will. Conversion is not precise, there will be a loss of performance. The higher the rank, the lower the loss, but also the less efficient the adapter. Also, for now, this only supports linear layers. Still, this has some advantages: - In PEFT, LoRA supports more features than most other methods, e.g. mixed adapter batches. Thus the converted adapter can be used with those features. - Some downstream packages support LoRA adapters, but not other PEFT methods, e.g. Diffusers. The conversion allows to use a non-LoRA adapter with those packages. TODOs: - Deal with no PEFT config - Right now, only LoKr support is there, add more - Documentation For the future: - Convert PEFT methods that are non-trivial, like IA³ - Convert layers other than linear layers
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
No tests yet
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds the possibility to convert a non-LoRA adapter into a LoRA adapter. Not all LoRA adapters will support this, but many will.
Conversion is not precise, there will be a loss of performance. The higher the rank, the lower the loss, but also the less efficient the adapter. Also, for now, this only supports linear layers. Still, this has some advantages:
Users can pass a fixed rank for the LoRA adapter or a float that will use a dynamic rank based on the threshold of the contribution of the singular values. I ran an experiment with converting a LoKr model trained on MetaMath to LoRA. The LoKr model's test accuracy was 37.5%. This is the test accuracy of the converted LoRAs:
For the future
For PEFT methods that do not work according to
W' = W_0 + dW, we have to find some workarounds. So let's say the method usesW' = W_0 * dWinstead. We can reformulate this asW' = W_0 * dW = W_0 + dW2. As we're actually interested indW2, we can isolate this asdW2 = W_0 * dW - W_0. I would suggest that we implement a new method on the tuner layer to implement this:Then, in the LoRA conversion function:
This way, we can continue using
get_delta_weightwhere applicable while leaving the door open for PEFT methods to implement a custom method specifically for LoRA conversion. LMK if you have a better idea for how to implement that.Of course, there can be PEFT methods that cannot be expressed at all in terms of
dW, which will be impossible to convert (most prominently the prompt learning methods).TODO
modules_to_savebiasargumentUnrelated changes
__repr__, so it was added.