@LuluViBritannia

LuluViBritannia@alien.top · 10 months ago

The entire market will eventually use local LLMs, it’s simply as that.

Online services are never an ideal solution for any business. It’s not just about privacy.

- The owners can do whatever they want, so if they change settings or even simply shut down, you’re screwed.

- Online services are a public traffick, so in case of high-density, it bottlenecks. Just like highroads, expect major slowdowns of your job if you use an online service that happens to be saturated. And slowdown means financiary loss.

- In case of internet issues, you’re screwed.

- you have to pay for the service, which can get freaking expensive depending on how much you use it.

Local LLMs have none of these issues. And more than that:

- While general intelligence like ChatGPT or Claude is incredible, it will never be enough for every use case. There will always be cases where you need a more specialized alternative, even if less intelligent.

- The gap between the big ones and local LLMs is frankly not that high. I’m not going to say “they’re as intelligent as ChatGPT!”, but as a matter of fact, everything I was able to make with ChatGPT, I succeeded with a local LLM as well or even better. Analysing code and rewriting it/completing it? Managed with a 7B. Writing creative short stories? Easy even with a 7B.

- An online service has its own abilities, and the devs can update it but you have no guarantee they will. In the case of LLMs, context length matters so much! OpenAI did raise GPT’s context length regularly, but what if they don’t?

- Intelligence isn’t the only point about an AI! A local LLM has its own default language style, and even the big ones are hard to steer away from it. ChatGPT’s answers, for example, are very lengthy, constantly. With a Local LLM, it’s easier to steer. You can even force it to adopt a certain format.

LuluViBritannia@alien.top · 10 months ago

Silero TTS is extremely fast, and combined with RVC you can clone any voice from any person/character. It’s a bit monotonous, but it’s the best available for free imo.

And if you want the best quality : use the 10000 free words per month of your 11Labs account. Once you run out of it, switch to Silero TTS. In both cases, plug the audio output into the input of a real-time RVC app.

LuluViBritannia@alien.top · 10 months ago

Well, first of all, this is something you do while running the model. Sure, it’s the same model, but it’s still two different processes to run in parallel.

Then, from what I gather, it’s closer to model finetuning than it is to inference. And if you look up the figures, finetune requires a lot more power and VRAM. As I said, it’s rewriting the neural network, which is the definition of finetuning.

So in order to get a more specific answer, we should look up why finetuning requires more than inference.

LuluViBritannia@alien.top · 10 months ago

I have been generating art with AI. There is an extension meant for exactly that : you literally tell the AI “good” or “bad” for each result, and it affects the weights of the model.

Sadly, it’s sheer impossible to run. Reinforcement learning isn’t just about “picking a random weight and changing them”. It’s rewriting the entire model to take your feedback into account. And that, while running the model, which in itself already takes most of your compute resource.

You need a shitton of VRAM and a very powerful GPU to run Reinforcement Learning for images. It’s even worse for LLMs, which are much more power-hungry.

Who knows, maybe there will be optimizations in the next years, but as of right now, reinforcement learning is just too demanding.

LuluViBritannia@alien.top · 11 months ago

Is there any way we can read those datasets? I’m a noob when it comes to “what’s under the hood”. On HuggingFace they show they tried to upload the dataset but it failed due to, likely, the sheer size of the thing…