Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

vatsadev@alien.top · 10 months ago

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

Tiny_Arugula_5648@alien.top · 10 months ago

one of those cases where proving something can be done doesn’t make it useful. This has to be one of the least efficient ways to do inferencing. Like the people who got Doom running on a HP printer. Great you did it but it’s the worst possible version.