Two sets of base models from China (Yuan 2.0-2B, 51B, 102B and XVERSE-7B, 13B, 65B)

Illustrious_Sand6784@alien.top · 2 years ago

Two sets of base models from China (Yuan 2.0-2B, 51B, 102B and XVERSE-7B, 13B, 65B)

fallingdowndizzyvr@alien.top · 2 years ago

I’m really interested in having a 51B model. I would love something between 34B and 65/70B.

mrjackspade@alien.top · 2 years ago

So I don’t know much about architecture but I’m assuming if we want to run something like this in Llama, we’re going to need to submit a request? If its ground up, then pretty much everything is going to need to be implemented, right?

Aaaaaaaaaeeeee@alien.top · 2 years ago

Deepseek 67B still beats XVERSE-65B in the benchmarking scores.
The benchmarks indicate strong math and coding performance for these two model series.
Yuan has a unique optional attention mechanism that enhances output quality