Is there any way we can involve another model (let’s call it Model B) to manipulate the logits of Model A? This way, we could incorporate information from Model B when calculating the final outputs of Model A. One way is done by Dexperts paper, but has anyone done it in more straightforward/easier way for LLaMA based model?

  • brainx98@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Interesting it looks like distillation learning but can adapt to what you want