🚀 Launching SauerkrautLM-7b-HerO: A New Era in German Language Modeling!

AffectionateCan2342@alien.top · 3 years ago

🚀 Launching SauerkrautLM-7b-HerO: A New Era in German Language Modeling!

yahma@alien.top · 3 years ago

Very exciting for multi-lingual models. I really hope this one performs as well as the benchmarks suggest.

AffectionateCan2342@alien.top · 3 years ago

Yes, we hope so too ;-) At least our first tests in real-world operation have shown quite good results. However, it should be noted that even if the benchmark results sound very promising, it is still a 7b model that has been pre-trained in English.

Although the model can respond very well in German thanks to our fine-tuning with German data, there can still be slight grammatical errors here and there, especially if the parameters for the inference were set too high. This is currently difficult to avoid, especially when it comes to smaller models. But we are already working on a solution.

There is always a fine line between: Keep the intelligence of the original English-language model and teach the model just enough so that it can “speak” German well.

No-Link-2778@alien.top · 3 years ago

Do you think there is any scientific basis for the merge? This is medieval alchemy again. And I hope you can make some data public that you recognize as a native speaker, which would be good for public research, rather than merging without theoretical basis in order to improve “score performance”.

AffectionateCan2342@alien.top · 3 years ago

You could at least justify that the scientific basis for merging is given by the published papers on this topic area. Here are a few examples: https://arxiv.org/abs/2306.01708 https://arxiv.org/abs/2203.05482 https://arxiv.org/abs/2204.03044

Nevertheless, it must be admitted that some merges that should achieve good results on paper only produce gibberish in practice or vice versa. So you probably need a bit of luck ;-)

For the German-speaking world, however, I can definitely say that we are not primarily interested in getting better numbers, but in making the English-language models accessible to the German language, at least to some extent, without completely eliminating their cleverness. So the more intelligent the original English model is before it is fine-tuned with German data, the less stupid the model will be in German, and that is our goal as long as there are no German pretrained models.

EnnioEvo@alien.top · 3 years ago

It would be awesome if you could release some info to reproduce it with other languages

yahma@alien.top · 3 years ago

Has anyone tested this yet? We have a use case for our European partners from German speaking countries. Would like to know what other people’s experiences are.

Traditional-Plate642@alien.top · 3 years ago

I think everyone is waiting for TheBloke :D

Ion_GPT@alien.top · 3 years ago

The quantization will greatly reduce multilingual capabilities

GlitteringCheetah707@alien.top · 3 years ago

Hey Folks, we will reply to your comments in the next days. Sorry for being a little inactive. Sauerkraut Team has lots of stuff to do at the moment.

Ion_GPT@alien.top · 3 years ago

Do you have a prompt for translating?

AffectionateCan2342@alien.top · 3 years ago

Try a few different prompts and let us know what worked for you. For shorter translations, however, it should definitely be sufficient if you keep the system prompt and include the instruction for translating in the user prompt.