@hugganao

hugganao@alien.top · 10 months ago

how does merging work with what layers to choose from what models in the merging process?

hugganao@alien.top · 11 months ago

My main desktop is an RTX 4090 windows box, so I run phind-codellama on it most of the time. If I need to extend the context window then I swap the M2 Ultra to phind so I can do 100,000 token context… but otherwise its so darn fast on the 4090 running q4 that I use that mostly.

are you running exllama on phind for 4090? was there a reason you’d need to run it on m2 ultra when switching to 100k context?

also, I didn’t know mistral could do coding tasks, how is it?