@AsliReddington

AsliReddington@alien.top · 1 year ago

By that logic every LLM put there will engage in talk about Xi

AsliReddington@alien.top · 1 year ago

Yeah man just use langchain+pydantic class/guidance lib by MS with Mistral Instruct or Zephyr & you’re golden

AsliReddington@alien.top · 1 year ago

All you need a 32K LLM. Everything beyond that needs a tool invocation where the archived texts can be pulled from. You’ll have to make your orchestrator smart enough to know that there is content beyond just needs to be invoked

AsliReddington@alien.top · 1 year ago

Just run on TGI or vLLM for flash attention & continuous batching for parallel requests

AsliReddington@alien.top · 1 year ago

It’s extremely overpriced. With INT4 llama.cpp does even crazier numbers. A system with 4090s can be made for $2500 in India & cheaper elsewhere for sure.

AsliReddington@alien.top · 1 year ago

I feel like verifiable math & physics simulation should be something which every LLM should just invoke as a tool instead of trying to do it within slowly

AsliReddington@alien.top · 1 year ago

vLLM, TGI, Tensort-LLM

Fuyu-8b