Recently, I’ve been working on some projects for fun, trying out some things I hadn’t worked with before, such as profiling.

But after profiling my code, I found out that my average GPU activity is around 50%. Apparently, the code frequently hangs for a few hundred milliseconds on the dataloader process. I’ve tried a few things in the dataloader: increasing/decreasing the number of workers, setting pin-memory to true or false, but neither seems to really matter. I have an NVME drive, so the disk is not the problem either. I’ve concluded that the bottleneck must be the CPU.

Now, I’ve read that pre-processing the data might help, so that the dataloader doesn’t have to decode the images, for example, but I don’t really know how to go about this. I have around 2TB of NVME storage, and I’ve got a couple datasets on the disk (ImageNet and INaturalist are the two biggest ones), so I don’t suppose I’ll be able to store them on the disk uncompressed.

Is there anything I can do to lighten the load on the CPU during training so that I can take advantage of the 50% of the GPU that I’m not using at the moment?

  • RetroPenguin_@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Does your GPU util peak at 100%? If not, increase the batch size until it does (roughly). A couple ideas: do any transforms on the GPU or before starting your training job, have the CPU be solely responsible for loading images from disc. Increase the number of workers to the number of CPUs you have.