Let's speak theory. Exploring the Potential of Collaborative Training?

paryska99@alien.top · 1 year ago

Let's speak theory. Exploring the Potential of Collaborative Training?

paryska99@alien.top · 1 year ago

Would it be possible to create a system where every model’s training includes a specific set seed and records its exact state, and then share this information with the dataset it was trained on to ensure we can reproduce the training? This method could help manage the randomness in training.

Using a set seed means we can make sure that the way the model starts and how it learns during training is the same every time. Essentially, if we restart the training from a certain point with this seed, the model should learn in the same way it did before. Also, by saving and sharing details like the model’s structure, which training stage it’s in, and the training step, along with the seed, we’re essentially taking a ‘snapshot’ of where the model is at that moment.

Others could use this snapshot to pick up the training right where it was left off, under the same conditions. For merging different models, this technique could help line up how they learn, making it easier and more predictable to combine their training.

Am I thinking right about this or am I missing something? This is just theoretical thinking and I am not an expert on the subject.

paryska99@alien.top · 1 year ago

[Discussion] Let's speak theory. Exploring the Potential of Collaborative Training?

paryska99@alien.top · 1 year ago

Doesn’t the LlamaCpp server host a GUI for multimodal? You could potentially visit it, open the developer panel in your browser, and observe the HTTP requests being sent.

paryska99@alien.top · 1 year ago

Thanks for the input.

What inference engine did you use? It’s possibly a bug as these things tend to happen with the new models.
I for one can’t wait for the lookahead decoding in llamacpp and others, combine that with some smaller models and we’ll have blazing fast speeds on pennies worth of hardware from what i recon.

paryska99@alien.top · 1 year ago

There is new rocket 3b that might be worth a try. It’s suspiciously high in benchmarks so i suspect contamination of the dataset, but I saw people have good experience with it.

paryska99@alien.top · 1 year ago

Future is going to be interesting. With this kind of CPU speedup we can run blazing fast LLMs on a toaster if it has enough RAM.

paryska99@alien.top · 1 year ago

Oh wow, this seems almost too good to be true

paryska99@alien.top · 1 year ago

Oh wow, I know the results are probably cherry picked, but this still seems like such a step-up.

paryska99@alien.top · 1 year ago

Yes! I’ve been waiting for progress in video for a while! Imagine dyi automated classification for the sake of compilations and edits. This is going to be sick! Can’t wait and see an implementation on llamacpp

paryska99@alien.top · 1 year ago

OpenChat finetunes?

paryska99@alien.top · 1 year ago

I can’t wait to see some finetunes of openchat-3.5. this thing is way too smart for a 7b. Frankly I am amazed at how fast we went from 7b can’t keep it togheter to “this 7b is pretty much on par with chatgpt-3.5” (for a lot of use cases at least)

paryska99@alien.top · 1 year ago

I hope we can get quantized gguf soon from the legendary TheBloke

paryska99@alien.top · 1 year ago

Also i’d give the new openchat 3.5, if the benchmarks are indeed correct then it’s the best 7B model so far (altough there are so many of them that i might be wrong, but it’s better than base mistral 7B)

paryska99@alien.top · 1 year ago

I know these benchmarks are a tough topic, but this on paper looks really impressive. It states to be better than mistral and I loved the progress mistral brought. If someone tries this model out can you give feedback under this post? Much appreciated