Yi-34B vs Yi-34B-200K on sequences <32K and <4K

DreamGenX@alien.top · 2 years ago

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

No-Link-2778@alien.top · 2 years ago

I have trained book3 for 1 day on a number of GPUs on the 200k, 34B & 6B, it is totally garbage.
It is not a BASE model at ALL. It even knows itself as GPT sometimes. It was a SFT model on format of benchmarks.
Try it before you do silly things, you would not find it on SFT immediately, but sooner or later.

dogesator@alien.top · 2 years ago

It referring to itself as a GPT could just be from pre-training internet data if it was trained on internet data from 2023.

BlueMetaMind@alien.top · 2 years ago

It sounds rather like it trained on chatGPT output and they didn’t curate it enough to delete those “As a large language model trained by openAI…” category statements.

It’s kinda like Shutterstock watermarks showing up in image generation.

dogesator@alien.top · 2 years ago

Yea I’m saying that ChatGPT outputs are contained on internet posts in the year 2023, so simply training from 2023 internet data would end up with training on ChatGPT data as a side effect.

BlueMetaMind@alien.top · 2 years ago

Yes, I understood you. My claim differs in that I think they DIRECTLY used a lot of GPT4 output through the api, which is very probable because a lot of LLM training is done that way. You ask GPT4 to generate examples of conversations with properties you want your LLM to learn and then train on that.

In order for self identification, as GPT I don’t think that randomly crawled chat Examples from the Internet would be enough.

I am not trying to make a strong claim on that, it’s just a thought. My people both.

Yi-34B vs Yi-34B-200K on sequences &lt;32K and &lt;4K

Yi-34B vs Yi-34B-200K on sequences &lt;32K and &lt;4K

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

Yi-34B vs Yi-34B-200K on sequences <32K and <4K