• 0 Posts
  • 3 Comments
Joined 1 year ago
cake
Cake day: November 25th, 2023

help-circle


  • This might be pedantic, but this is a field with so much random vocabulary and it’s better for folks to not be confused.

    MoE is slightly different. An MoE is a single LLM with gated layers that “select” which layers to route embeddings/tokens to. It’s pretty difficult to scale and serve in practice.

    I think what you’re referring to is more like a model router. You can use a general LLM to “classify” a prompt and then route the entire prompt to a downstream LLM. It’s unclear if this would be faster than a 70B LLM since you would repeat the encoding phase and have some generation, but it could certainly be better.