• 7 Posts
  • 14 Comments
Joined 11 months ago
cake
Cake day: October 31st, 2023

help-circle











  • the claimed 117.83x speedup, might be somewhat misleading

    If you compare the best implementation of FFF on CUDA to the best implementation of FF on CUDA, then the speed-up they got is 3.15x:

    See Page 5 Further comparisons: “On GPU, the PyTorch BMM implementation of FFF delivers a 3.15x speedup over the fastest (Native fused) implementation of FF”

    The 40x that u/lexected mentioned seems to apply only when comparing to an apparently much slower FF version.

    It’s a pretty cool paper regardless, as far as I can tell from skimming it. But it could benefit from stating more clearly what has been achieved.