@YearZero

YearZero@alien.top · 2 years ago

I think 13b Q8 is just cutting it really close with your 6GB vram and 16GB ram. You’d be much better of using the Q6 quant, and definitely anything below that would be ok.

Look at the model card, TheBloke lists RAM requirements for each quant (without context). Since this model uses 4096 tokens for context, you would add another 1-2 gigs to the requirements.

You might have some luck if you allocate the right amount in the parameters (as right now you’re allocating 0 to the GPU), but definitely play with lower quants, you wouldn’t even notice the quality loss until you get into maybe Q3.

YearZero@alien.top · 2 years ago

Unfortunately I don’t have enough ram/gpu, and too broke right now to afford paying for extra! But in the future I hope I will

YearZero@alien.top · 2 years ago

We may not have ChatGPT for much longer the way this is unfolding. Makes me really glad everyone on this sub has good local models now, as a substitute.

YearZero@alien.top · 2 years ago

Nope they’re both really good and very close to each other in my tests: https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit?usp=sharing&ouid=102314596465921370523&rtpof=true&sd=true

YearZero@alien.top · 2 years ago

I miss the days when high-end gpu’s were like $400-500! I’m not made of moneys, and I also use a laptop, so the most I could buy right now would be 16gb vram anyway. I’ll probably save up and wait for next gen and see if they make any headway there.

YearZero@alien.top · 2 years ago

unfortunately I haven’t used ooba in a few months so I can’t tell you, but in koboldcpp it just tells you what values disable the samplers.

YearZero@alien.top · 2 years ago

I personally use one called Continue. It’s amazing!

YearZero@alien.top · 2 years ago

Yeah I basically turn the temperature to 0.1, disable every sampler, and turn the temperature as low as the GUI will allow (I have mine at 1). I’m using Koboldcpp 1.50 and deepseek-coder-instruct 33b is working very well for me. If it’s not on par with GPT-4, it’s incredibly close. I tested the 7b model and it’s pretty good, but it does mess up more frequently requiring you to fix its mistakes. 33b gives me workable code more often than not.

I’ve been testing it on a bunch of different problems on this site: https://www.w3resource.com/index.php

…and it seems to ace everything I throw at it. Granted those aren’t particularly challenging problems, but still, it’s very consistent, which means I can use it for work reliably (I work mainly with SQL). The 16k context doesn’t hurt either!

Now I just wish I had more than 8gb vram, cuz I’m getting like 1.3 Tokens per second, so I have to be super patient.

Also I just added it to my VSCode using the Continue extension (while the model runs using llamacpp). It works beautifully there too (once you configure the prompt correctly to what the model expects). If you use VSCode at all, you can now have a really good copilot for free.

YearZero@alien.top · 2 years ago

Testing it now, but it’s worse than 7b models on logic questions for me. Huge disappointment compared to Dolphin and Nous-Capybara, both Yi finetunes and are the best models I’ve tested so far. It just goes to show you how much difference finetuning a base model can make.

YearZero@alien.top · 2 years ago

I’d love to see a comparison with DeepseekCoder instruct 34b as well, as I think this is the open source SOTA.

YearZero@alien.top · 2 years ago

s I found today and it seems close enough to dolphin 70b at half the size.

I’m getting broken replies in koboldcpp, although it runs perfectly in llamacpp for me. Not sure why, koboldcpp is my go to.