For anyone that is interested. Here is a code that will do this as long as you have some knowledge of python and conda you should maybe be able to get it to work. Just follow the instructions. Maybe.
For anyone that is interested. Here is a code that will do this as long as you have some knowledge of python and conda you should maybe be able to get it to work. Just follow the instructions. Maybe.
Sounds like you might be using the standard transformer loader. Try exllama or exlamav2
Well that’s because he’s not. Sam is actually my dad.
Lol cause musk is totally reliable with what he says.
I didn’t say it wasn’t. But getting into LLMs really just shows you how much better your PC can be and you will never been as cutting edge as you think or want.
Everything on that page is hype for something that doesn’t exist.
Uuummmm no. It’s for sure real. And the best one out there. No questions asked. It’s better that CHATGPT 4 and OpenAI has been trying to hack this new company to get the 600b model because they are scared that it will end OpenAI for good.
Obligatory /s
It’s the best out there…. But no you can’t try it because it’s to dangerous.
So you are soon gunna realize that unfortunately your pc is not as cutting edge as you think. Your main need is vram. For the 4070 ti you only have 12 gigs of vram. So you will be limited to 7b and 13b models. You can load into ram though but your speeds plummet. Mistal 7b is a good option to start with.
Ram memory bandwidth is still gunna screw you over.
I’ll fuck around with it when I get home.
Ok. So im basically an idiot. What does this mean and which one should i use?
The bloke has entered the chat.
How many tokens/s do you get with the p40. I’ve been contemplating getting one and using it along side my 3060 12 gig.
Is…… that a thing? I need that in my life.
I didn’t know I needed this information till now
For a 34b model you should be fine. I run 34b models on my duel 3060s and it’s very nice. Usually like 20ish tokens a second. If you want to run like a 7b model you can get basically instant results. With Mistal 7b I’m getting almost 60 tokens a second. It’s crazy. But it really depends on what you are using it for and how much accuracy you need.