@mushytaco

mushytaco@alien.top · 1 year ago

This has been explored a little for nlp and even audio tasks (using acoustic tokens)!

https://aclanthology.org/2022.findings-acl.25/ and https://arxiv.org/abs/2307.04686 both come to mind

Feel like diffusion and iterative mask/predict are pretty conceptually similar—my hunch is that diffusion might have a higher ceiling by being able to precisely traverse a continuous space, but operating on discrete tokens probably could converge to something semantically valid w fewer iterations.

Also Bert is trained w MLM which technically is predicting the og text from a “noisy” version, but noise is only introduced via masking, and it is limited to a single forward pass, not iterative!