New Microsoft codediffusion paper suggests GPT-3.5 Turbo is only 20B, good news for open source models?

obvithrowaway34434@alien.top · 2 years ago

New Microsoft codediffusion paper suggests GPT-3.5 Turbo is only 20B, good news for open source models?

LiquidGunay@alien.top · 2 years ago

What has your experience with mistral been? Because going from llama 13B finetunes to mistral 7B, I found that it was remarkably better at following instructions (Prompt engineering finally felt like it was not just guessing and checking). Considering it is just a 7B, a 20B might be that good (It could also just be a MoE of 20B models)

sebo3d@alien.top · 2 years ago

I only really use Mistral Claude and Collective cognition but from perspective of a role player who uses LLMs mostly for just that my overall experience with Mistral(finetunes) has been mostly positive. 7B’s speed is undeniable, so this is a very major benefit it has over 13Bs and for a 7B it’s prose is excellent as well. What i also noticed about mistral models is that unlike 13Bs such as mythomax or remm-slerp they tend to pay closer attention to character cards as well as your own user description and will more commonly mention things stated in the said description.(For example my user description in SillyTavern had a note saying that my persona is commonly stalked by ghosts and model actually made a little joke about it saying “how are your ghostly friends are doing these days” which is something that NO 13B i used has done before) Still though 7B IS just 7B so model tends to hallucinate quite a bit, constantly tries to change the formatting of the roleplay and tends to roleplay as you unless you REALLY finetune the settings to borderline perfection so i have to swipe and/or edit responses quite a bit.