So RWKV 7b v5 is 60% trained now, saw that multilingual parts are better than mistral now, and the english capabilities are close to mistral, except for hellaswag and arc, where its a little behind. all the benchmarks are on rwkv discor, and you can google the pro/cons of rwkv, though most of them are v4.

Thoughts?

  • Maykey@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I don’t think a linear transformer has a serious chance to beat a standard transformer with the same number of parameters.

    I do. Transformers are not good on long range area.. They perform well only if they are backed by better architectures as in case of MEGA.