Openhermes seems pretty capable of “other use”, no?
Openhermes seems pretty capable of “other use”, no?
Regarding that “prediction” setting, what exactly is it? I remember n_predict from using llama.cpp directly but I think i always set it to -1 for like max. And I think I don’t even have such a setting in llama-cpp-python?
Certainly interesting! But in the end, there’s something wrong with the model if anything like that is needed. Like obviously it isn’t really fully capable of writing proper answers if it somehow thinks that writing in circles would be the best thing to do.
It is not a transformer?
I think that’s not the whole story. The smaller increments can lead to “course changes” that would not have happened otherwise. Might let things slip into other local minima and all that. It’s not just several small steps instead of one big one. The straight line that is the big step will become a curve, capable of bringing you into an entirely different place. The whole dataset can have its impact before some giant leaps jump into a single direction. As a laymen, maybe I’ve got this wrong, but I really don’t see how you can categorically dismiss the possiblity of creating a much more robust and effective architecture instead of essentially jumping to conclusions and then somewhat fixing it up.
If anyone managed to consistently stop OpenHermes2.5 from ranting three paragraphs of fluff instead of answering a single sentence with the same content, let me know.
Wouldn’t there be some sort of employee stock sale coming up? So every single one of those is willing to run OpenAI into the ground and personally lose millions, instead of losing their fucking CEO? Wtf?
Well it’s the same for investing in MS to indirectly invest in OpenAI. Lots of people are doing it. It’s really the same with all available actors, they all have lots of other business too. Like, try to invest in Amazon Web Services without investing in their stupid delivery business.
Anyway, I think the real stupid thing here would be to even short OpenAI. So that’s why it wasn’t really a suggestion.
I find it somewhat interesting that Sutskever literally seems to have quite the big brain, judging by his head. Is that weird?
Yeah I mean he must know the weights so he can just write them down.
You can put your money where your mouth is and indirectly short them via MS stocks.
I mean you can be furious about less profits but really this wasn’t that much of a risky move for MS. Most of the money they gave them is literally to pay MS for compute. And then they apparently take most of OpenAI earnings until payd back or something. That’s pretty different from actually giving someone 10B and your money is gone if they go down the drain before getting out of the red numbers.
“Mensch” is what I would call myself if I were an AI from the future. Just saying.
I assume the progress is based on well structured, high quality training data, combined with an incremental “learning schedule”. At least that’s where some reports of massive progress seem to be coming from and it’s also very intuitive that this would help a lot.
No, the problems described are not representative of Mistral 7B quality at all. That’s almost certainly just incorrect prompting, format wise.
They were talking about the prompt format. Because obviously their library will be translating that OpenAI API-style to actual proper prompt format internally, which is not documented at all.
I think I want to write my own paper. You know, get some of that basic stuff down. Do you think if am able to create a proper paper, that will be scientifically legit, like, if I publish it on my own website? Without any academic credentials? Or can I upload it to that website with the idiotic name? What was it, xcifdfs?
Say, these finetunes are all merged LoRA stuff, aren’t they? Is nobody doing stuff where you just continue regular training with your own dataset?
I don’t agree with the assumption that there is a pressure for companies like MS to reduce costs via local models. Compute on that gamer’s PC is probably the biggest problem right now. Especially since in a game, pretty much all of the hardware is already used to the limit. And then you throw a 10GB LLM on top, maybe even loading different finetunes for different jobs? Then the TTS model? This does not result in reasonable response times any time soon, at least not with somewhat generalistic models.
On the other hand, that’s something MS must like a whole lot. What you see as “optimizing costs” is optimizing their profit away. They can sell that compute to you. That’s great for them, not something to be optimized away. And it’s the best DRM ever too.
Oh it wasnt about your choice of words, that seems fine.