Having a hard time setting deepseek coder instruct to work

iChrist@alien.top · 2 years ago

Having a hard time setting deepseek coder instruct to work

Feeling-Ingenuity474@alien.top · 2 years ago

alternatively you can use ollama and run deepseek from there. you have good specs, it will run smoothly

FullOf_Bad_Ideas@alien.top · 2 years ago

I have it (33b) running pretty well, gptq in oobabooga, rtx 3090 ti, 64GB of RAM, exllama v2 hf loader, standard alpaca template without modified system prompt. I also have the same ‘’‘’‘’‘’’ with awq version. Please share the version of gptq that you have (group size, act order). I will post exact settings I use in an hour. I don’t know how the version I have locally compares to the hosted version, but it’s pretty good. There is a simple possibility that gptq quant is destroying model’s capability and I am not noticing it but you do.

I know it’s a stupid thing, but make sure you actually chose the instruct mode in the chat window itself, I didn’t notice those options at first and got weird results with some models, since I wasn’t using the right prompt (default one was applying, not alpaca)

kpodkanowicz@alien.top · 2 years ago

lol, I will stop wasting my time now - I spent roughly 3 hours today trying to get it to work :D Mostly around GGUF

tamereen@alien.top · 2 years ago

With oobabooga you have to modify your requirement.txt to get the latest llama_cpp

Do a git pull ,then replace inside requirement.txt

llama_cpp_python-0.2.11 by llama_cpp_python-0.2.18

Then still in your env

pip install -r requirements.txt --upgrade

PS: even 0.2.14 gave me bad answers (start to answer then fill result with 3333333…

0.2.18 fix the issue.

TobyWonKenobi@alien.top · 2 years ago

If you are using it on LM Studio, I think you need to upgrade to the latest Beta, which includes a fix.

I ran into the same issues with Deepseek Gguf

nullnuller@alien.top · 2 years ago

I tried 0.28 and it only shows newlines in the server logs and nothing at all in the UI

iChrist@alien.top · 2 years ago

I found the fix for this issue (Tested by me only, thanks to u/FullOf_Bad_Ideas for the suggestion)

reduce the Repetition penalty to 1, the code will be much better, and closely resemble what is generated on the website. (tested multiple times with pong and snake)

YearZero@alien.top · 2 years ago

Yeah I basically turn the temperature to 0.1, disable every sampler, and turn the temperature as low as the GUI will allow (I have mine at 1). I’m using Koboldcpp 1.50 and deepseek-coder-instruct 33b is working very well for me. If it’s not on par with GPT-4, it’s incredibly close. I tested the 7b model and it’s pretty good, but it does mess up more frequently requiring you to fix its mistakes. 33b gives me workable code more often than not.

I’ve been testing it on a bunch of different problems on this site: https://www.w3resource.com/index.php

…and it seems to ace everything I throw at it. Granted those aren’t particularly challenging problems, but still, it’s very consistent, which means I can use it for work reliably (I work mainly with SQL). The 16k context doesn’t hurt either!

Now I just wish I had more than 8gb vram, cuz I’m getting like 1.3 Tokens per second, so I have to be super patient.

Also I just added it to my VSCode using the Continue extension (while the model runs using llamacpp). It works beautifully there too (once you configure the prompt correctly to what the model expects). If you use VSCode at all, you can now have a really good copilot for free.

iChrist@alien.top · 2 years ago

Thank you for the response!

Il try to adjust the temp too, how can I disable samplers in oobabooga? what is the setting?

Is there a way to set rep penalty lower than 1?

YearZero@alien.top · 2 years ago

unfortunately I haven’t used ooba in a few months so I can’t tell you, but in koboldcpp it just tells you what values disable the samplers.

AfterAte@alien.top · 2 years ago

Oh nice! I’ll have to try those settings and compare with the StarChat preset in Oobabooga. I hear ya, I get 1t/s too… it’s unbearable.

YearZero@alien.top · 2 years ago

I miss the days when high-end gpu’s were like $400-500! I’m not made of moneys, and I also use a laptop, so the most I could buy right now would be 16gb vram anyway. I’ll probably save up and wait for next gen and see if they make any headway there.

vasileer@alien.top · 2 years ago

works for me with the latest llama.cpp on Windows (CPU only, AVX)

command

`main -m …/models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf -p “### Instruction\n:write Snake game in python\n### Response:” -n 2048 -e`

result

https://preview.redd.it/k0poo4o1171c1.png?width=978&format=png&auto=webp&s=3bf1fc497ed66da28742af4d53972c5e15928390

Serious-Commercial10@alien.top · 2 years ago

If you want to run it reliably, it’s best to clone the PR in the link and compile it yourself. Quantizing gguf yourself is actually quite fast

https://github.com/ggerganov/llama.cpp/pull/4070

I’ve never successfully run the AutoAWQ model on a 3090, and I won’t be trying it again!

mantafloppy@alien.top · 2 years ago

I have’nt been able to run the .gguf in either LM Studio, Ollama or oobabooga/text-generation-webui.

I had tu run it directly with llama.cpp in command line to get it working.

Something about using a special end token and not having standart transformer or something…

https://huggingface.co/TheBloke/deepseek-coder-33B-instruct-GGUF/discussions/2

nullnuller@alien.top · 2 years ago

Do you just copy and paste the terminal output containing \n and whitepspaces in the .\main output to VSCode or similar IDE and it works?

mantafloppy@alien.top · 2 years ago

Now that you pointing it out, they are there because i copy/pasted this from a code block somewhere.

But that what i write in my command line and it dont seem to cause issue.