Hi, as Speculative Decoding runs a small model and a large model at the same time with a sampler in between, but in this instance the sampler’s job is to NOT skew the probability distributions while doing so. There’s a fairly simple python implementation of this idea here. Is there a way we can adjust the probability distributions of either the small model or the large model for the task of generation?
You must log in or register to comment.
No relevant code picked up just yet for “Fast Inference from Transformers via Speculative Decoding”.
Request code from the authors or ask a question.
If you have code to share with the community, please add it here 😊🙏
–
To opt out from receiving code links, DM me.