• 0 Posts
  • 9 Comments
Joined 11 months ago
cake
Cake day: November 1st, 2023

help-circle
  • I use runpod for everything I can’t do locally and I’ve been very happy with it. I initially chose it just because it was one of the cheapest, indeed way cheaper than the big 3, but I’ve had a good experience.

    The main downside I know of runpod is that you can only run a container image, you can’t have a full VM. but for most use cases I think this is really no big deal. if you want a generic sandbox for interactive experimentation, rather than to run an actual containerized app, you can just use the runpod pytorch image to get a starting point with cuda and pytorch and some other common stuff installed and then just ssh into it and do whatever. i.e. you don’t necessarily have to bother with a more “normal” containerized deployment where you’re writing something that runs unattended or exposes an API or whatever, writing a dockerfile etc

    full disclosure my recent experiments are all testing different setups for inference with continuous batching, i’m personally not doing training or finetuning. but as far as I can tell runpod would be equally applicable for training and finetuning tasks






  • Anyone has any solutions for these?

    Use a high quality model.

    That means not 7B or 13B.

    I know a lot of other people have already said this in the thread, but this keeps coming up in this sub so I’m just gonna say it too.

    Bleeding edge 7B and 13B models look good in benchmarks. Try actually using them and the first thing you should realize is how poorly benchmark results indicate real world performance. These models are dumb.

    You can get started on runpod by depositing as little as $10, that’s less than some fast food meals, just take the plunge and find out for yourself. If you use an RTX A6000 48GB they’ll only charge you $0.79 per hour so you get quite a few hours of experimenting to feel the difference for yourself. With 48GB VRAM you can run Q4_K_M quants of 70B with full GPU offloading, or try Q5_K_M or even Q6 or Q8 if you tweak the number of layers you’re offloading to fit within 48GB (and still get fast enough generations for interactive chat.)

    The difference is just absolutely night and day. Not only do 70Bs rarely make the basic mistakes you are describing, sometimes they even surprise me in a way that feels “clever.”




  • If you take a really sober look at the numbers, how does running your own system make sense over renting hardware at runpod or a similar service?

    To me it doesn’t. I use runpod, I’m just on this sub because it’s the best place I know to keep up on the latest news in open source / self-hosted LLM stuff. I’m not literally running it “locally.”

    As far as I can tell there are lots of others like me here on this sub. Of course also many people here run on their own hardware, but it seems to me like the user base here is pretty split. I wonder what a poll would find.