It’s an innovative approach, but the practical real-world use case where it is beneficial are very very narrow:
https://twitter.com/joao_gante/status/1727985956404465959
TL;DR: you have to have massive spare compute to get a modest gain in speed. In most cases, you get slower inference. They are also comparing speeds to relatively slow native transformers inference. Exllamav2, GPTQ, and llama.cpp compared to base transformers performance is much more impressive.
I’m not sure how this would be applicable in those other scenarios you’ve mentioned; anything is possible. There may be other uses for this novel decoding method. But being touted as being X percent faster than transformers in a useful way isn’t one of them.