Is there any way we can involve another model (let’s call it Model B) to manipulate the logits of Model A? This way, we could incorporate information from Model B when calculating the final outputs of Model A. One way is done by Dexperts paper, but has anyone done it in more straightforward/easier way for LLaMA based model?
You must log in or register to comment.
Interesting it looks like distillation learning but can adapt to what you want