For example with the following structure:

  • System = GPT-4 Turbo + Llama2 +3rd LLM (!)+ Google or Bing API for websearch + Langchain + any vectorDB + Document upload + longterm Memory + …

Idee behind it is to get more accurate, updated (websearch) and specialized system or even let the LLms discuss your prompt before completion! Question is also, how shall the interaction of multiple LLMs in a system be organzied (Algorithm, Python Library …)? And what kind of Interaction can/should this be? Master-slave or Multi-Master system?

  • crazymonezyy@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Practically the latency on this is going to be so bad the user will go to sleep before your thing responds.