All the cases it is better than GPT-4 are benchmarks involving Chinese language. OpenAI is going to have a hard time getting access to extensive Chinese language datasets so it’s not surprising a 72B model can beat GPT-4, though it’s still impressive in it’s own right.
https://preview.redd.it/sdofti9odg3c1.jpeg?width=1792&format=pjpg&auto=webp&s=d6f56d56c3596924ea61e1e5429018c0222907d2
Amazing capabilities on some benchmarks if true.
big if true
Bit disappointed by the coding performance but it is a general use case model. It’s insane how good gpt 3.5 is for how fast it is.
Apparently the chat version has about 64 for humaneval.
What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse
All the cases it is better than GPT-4 are benchmarks involving Chinese language. OpenAI is going to have a hard time getting access to extensive Chinese language datasets so it’s not surprising a 72B model can beat GPT-4, though it’s still impressive in it’s own right.