patricky168@alien.topB to Machine Learning@academy.gardenEnglish · 10 months ago

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?

1

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?

patricky168@alien.topB to Machine Learning@academy.gardenEnglish · 10 months ago

I’m been playing around with methods such as prompt tuning and LoRA, which are parameter efficient as they only fine-tune a very small fraction (that is, <1%) of all parameters.

But for both methods, you have to cache the intermediate gradients during backprop, meaning that you don’t save on GPU memory at inference (or at most a small amount of GPU memory saved, due to not having to store optimizer states for frozen layers). For instance, I’ve had LoRA reduce GPU memory footprint for my custom model from 8.5GB -> 8.1GB, which is very minimal. Fine-tuning time reduction also isn’t really a major advantage, with finetuning the same model reduced by 20ms per batch, from 210ms to 190ms.

This begs the question - what really is the practical reason for the popularity of parameter-efficient fine-tuning (e.g. prompt tuning w/ 1.6k+ citations) if it doesn’t really save on GPU memory and training time?

I can see two possible reasons (but I’m not really convinced they really explain the ‘hype’ around parameter-efficient fine tuning):

The fine-tuned model checkpoint for the downstream task is very significantly reduced. For example, in prompt tuning, we only need to save the tiny trained soft prompt (~very few megabytes), rather than the entire changed model weights (~many, many GBs) on our hard disk/SSD.
1. But from a practical point-of-view, I feel that most people suffer from a lack of compute (e.g. GPU memory) than hard disk space. In other words, it seems that training time and GPU memory consumption are more relevant concerns than saving on checkpoint storage space.
The second is robustness to domain shifts (since we are preserving the majority of the original model’s weights rather than destructively re-learning them), which was mentioned in the prompt tuning paper but not so much in the LoRA paper.
1. I could see this as a possible reason, but the gains in performance in the prompt tuning paper in the out-of-distribution setting are marginal at best, and LoRA doesn’t mention domain shifts.

(EDIT - I’m also wondering if there is there something else I’m missing to decrease GPU memory and runtime? I’ve heard QLoRA which adds 4-bit quantization of the model on top of LoRA, so perhaps that’s a way to tackle memory efficiency for LoRA. But I don’t know if there’s anything to reduce memory footprint for prompt tuning?)

Chat

koolaidman123@alien.topB
link
fedilink
English
arrow-up
1·
10 months ago
You’re doing something wrong. I’ve managed to reduce vram usage cost by >4x w lora on 7b llama models from 160gb vram to 40gb.

Performance is a separate issue, but thats the tradeoff for memory savings

Machine Learning@academy.garden

machinelearning@academy.garden

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !machinelearning@academy.garden

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
1 user / 6 months
1 local subscriber
1 subscriber
786 Posts
3.03K Comments
Modlog

mods:
communick@academy.garden