Hi, as Speculative Decoding runs a small model and a large model at the same time with a sampler in between, but in this instance the sampler’s job is to NOT skew the probability distributions while doing so. There’s a fairly simple python implementation of this idea here. Is there a way we can adjust the probability distributions of either the small model or the large model for the task of generation?

  • CatalyzeX_code_bot@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    No relevant code picked up just yet for “Fast Inference from Transformers via Speculative Decoding”.

    Request code from the authors or ask a question.

    If you have code to share with the community, please add it here 😊🙏

    To opt out from receiving code links, DM me.