hadal1337@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Should I use LoRa, RLHF or DPO?

1

1

Should I use LoRa, RLHF or DPO?

hadal1337@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

1

I’m thinking of using Llama 2 to detect spam messages:

The model will first be fine tuned using LoRa/PEFT with some public dataset.
Then, when given a block of text, it will decide if it’s spam and provide reasons for the user.
However, there can be false positives etc., so I figured a way to combat this would be to let the user tell the model if the response is correct or wrong (thumbs up/down).

Based on my requirements, is it better to use RLHF or DPO? Am I over complicating this, will fine tuning it based on user feedback work too?

Chat

oKatanaa@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
You’re better off using something like BERT rather than shooting a pigeon with a ballistic missile. It easier, cheaper, faster and much more reliable.