Low memory bandwidth utilization on 3090?

Aaaaaaaaaeeeee@alien.top · 2 years ago

Low memory bandwidth utilization on 3090?

brobruh211@alien.top · 2 years ago

Hi! What are your settings for Ooba to get this to work? On Windows 11 on a single 3090, I keep getting CUDA out of memory error trying to load a 2.4bpw 70B model with just 4k context. It’s annoying because this used to work but after a recent update it just won’t load anymore.

Aaaaaaaaaeeeee@alien.top · 2 years ago

8k with 2.4bpw and 20 t/s, the vram usage says 23.85/24.00 gb.

16k with 2.4bpw 20 t/s with fp8 cache

I have 0.5-0.6gb used for driving the monitor graphics on ubuntu.

Did you disable the nvidia system memory fallback that they pushed on Windows users? That’s probably what you need.

brobruh211@alien.top · 2 years ago

Thanks for the detailed answer! Ubuntu does seem to be much more memory-efficient compared to Windows. However, the problem just fixed itself seemingly overnight. Now I’m not running into out of memory errors. 8-bit cache is a godsend for vram efficiency.

mcmoose1900@alien.top · 2 years ago

Try exui instead of ooba.

Aaaaaaaaaeeeee@alien.top · 2 years ago

same story here.

Sat0r1r1@alien.top · 2 years ago

My results are the same as yours.

I use TabbyAPI, 70B 2.4bpw I get 20/T.