I’m fascinated by the whole ecosystem popping up around llama and local LLMs. I’m also curious what everyone here is up to with the models they are running.
Why are you interested in running local models? What are you doing with them?
Secondarily, how are you running your models? Are you truly running them on a local hardware or on a cloud service?
My biggest complaint about GPT3.5/4 is how every response is a single-serving response that is conclusive. This type of endlessly closed-end dialogue doesn’t reflect real human communication or thought. Although prompt-engineering can get around this problem, you also have to struggle against the censorship and strong bias towards being indefinite. By indefinite, I mean that it often refuses to state plainly what is true and what is false. For example, ask GPT how many penguins died in car accidents last year. The correct answer is “Zero.” But you’ll get something like “it’s highly unlikely that…” Try to get it to output “zero” and you’ll find it isn’t so easy.
Add all this up, and you start to realize that there are certain things that GPT isn’t suited for. In my use-case, that is creative writing. A model that is both atomically conclusive and stubbornly indefinite is somewhat useless in writing text that is inconclusive and definite.