hugganao@alien.topBtoLocalLLaMA@poweruser.forum•Why are you running local models? What are you doing with them?English
1·
11 months agoMy main desktop is an RTX 4090 windows box, so I run phind-codellama on it most of the time. If I need to extend the context window then I swap the M2 Ultra to phind so I can do 100,000 token context… but otherwise its so darn fast on the 4090 running q4 that I use that mostly.
are you running exllama on phind for 4090? was there a reason you’d need to run it on m2 ultra when switching to 100k context?
also, I didn’t know mistral could do coding tasks, how is it?
how does merging work with what layers to choose from what models in the merging process?