Using and losing lots of money on gpt-4 ATM, it works great but for the amount of code I’m generating I’d rather have a self hosted model. What should I look into?
I am using Meta’s AI: Code Llama via the API of deepinfra.
CodeBooga.
Phind-CodeLlama 34B is the best model for general programming, and some techy work as well. But it’s a bad joker, it only does serious work. Try quantized models if you don’t have access to A100 80GB or multiple GPUs. 4 bit quantization can fit in a 24GB card.
I tried the V7, which is supposedly better than GPT4. it couldn’t do the things I asked it to do, unlike GPT 4 (through Bing Chat). DeepSeek also did a couple of things, but its solutions where sometimes not ideal. It’s underwhelming.
The web search engine is interesting through.
You might want to check EvalPlus Leaderboard
#Model Size pass@1
1🥇GPT-4 (May 2023)🗒️N/A⚡76.8
2🥈DeepSeek-Coder-instruct🗒️33B⚡72.6
3🥉DeepSeek-Coder-instruct🗒️6.7B⚡70.1
That board is in serious need of an update, check the Yi-34b model, very impressive. Dolphin 2.2 Yi 34b is a variant I cant wait to try.
Great board. I wish they had Phind 7 or whatever they use on their live website.
As far as self hosted models go, deepseek-coder-33B-instruct is the best model I have found for coding. Anecdotally it seems more coherent and gives better results than Phind-CodeLlama-34B-v2.
Think this would be good-enough/suitable to use with AutoGPT/BabyAGI type situations? This is my main use-case, for bulk inspiration if not productivity. The API’s can get expensive if left on full-automatic overnight.
I wanna do something similar, please let me know what conclusion you reach to
What environment do you use to interact with self-hosted code models when coding? I’ve been using and enjoying Cursor for the way it’s integrated into the IDE, but I’ve been exploring options for going self-hosted just to feel freer from whatever record I’m putting on someone else’s server.
My code editor of choice (Helix) doesn’t support integrations or plugins so I haven’t tried Cursor or Copilot. I’m building my own UI right now that focuses on first-class support for models served by llama.cpp.
i would grab a server like vllm or text-generator.io (open source too)
Then get a model like others have suggested like deepseek or something to put in the server (both those servers are OpenAI compatible so makes switching easy)I’ve not heard of text-generator.io, is it as performant as vllm on multibatch or is it a wrapper around it?
Think this would be good-enough/suitable to use with AutoGPT/BabyAGI type situations? This is my main use-case, for bulk inspiration if not productivity. The API’s can get expensive if left on full-automatic overnight.
If you allow models to work together on the code base and allow them to criticize each other and suggest improvements to the code, the result will be better, this is if you need the best possible code, but it turns out to be expensive. So the best thing is the work of a team of models and not just one.
How hosting any model, which at the moment would be inferior to GPT-4, would cost less than 20 dollars per month?
Anyway, GitHub Copilot cost 10 and has plugins for any IDE, in VsCode has also a chat. I don’t remember which model is based on, but works pretty well. You might try that.
Deepseek Coder 6.7b is able to write the game Snake. Not many are able to do this!!
The phind.com model seems decent
I don’t have an answer for you, but I am curious, how much code do you have it generate on an average work/programming day?
i use Phindv, gpt3.5 free together and forward code between them to optimize and fix issues, works smooth for me.
I think copilot if another option it have chat extension now.
For coding specifically, have you looked at Amazons CodeWhisper - I think it’s free for personal use