[P] Model training bottlenecked by CPU.

AdSignificant9235@alien.top · 11 months ago

[P] Model training bottlenecked by CPU.

arg_max@alien.top · 11 months ago

CPU bottlenecks can be easily found by monitoring the CPU usage during training. If all of your cores are constantly at 100% your cpu might be too slow. If both the cpu and gpu are idle from time to time your storage could be the bottleneck.

To increase data loading performance, you could try out Nvidias Dali or FFCV which are both libraries optimized for that purpose. They replace some of the inefficient python code with highly optimised code. FFCV is quite nice but it requires you to convert your dataset into a specific format.