seanthenry@alien.topBtoLocalLLaMA@poweruser.forum•Running full Falcon-180B under budget constraintEnglish
1·
1 year agoGood work.
I’m not sure if it would be possible but for loading the layers and processing could the following be achieved.
On gpu 1 load layers 1,3,5,7 and on gpu 2 load 2,4,6,8 and run the layers in parallel.
Once a layer is complete start unloading it and loading the next layer instead of waiting to finish all loaded layers. That might only be useful for those with slower cards but the loading might slow the processing time and make it worse.
A cloud storage user.