RWKV v5 7b, Fully Open-Source, 60% trained, approaching Mistral 7b in abilities or surpassing it.

vatsadev@alien.top · 2 years ago

RWKV v5 7b, Fully Open-Source, 60% trained, approaching Mistral 7b in abilities or surpassing it.

vatsadev@alien.top · 2 years ago

Hmm, will have to check this stuff with the people on the rwkv discord server.

V5 is stable at context usage, and V6 is trying to get better at using the context, so we might see improvement on this

MichalO19@alien.top · 2 years ago

If I understood correctly the original explanation on github for RWKV, BlinkDL agrees that softmax attention is very capable in theory but he thinks Transformers are not using it to full potential, so theoretically less capable architectures can beat them.

This might be true, but I kind of doubt it. I played a bit with the 3B RWKV with a prompt like

User: What is the word directly after "bread" in the following string "[like 20 random words]" 
Assistant: The word directly after "bread" is "

(note the preferred for RWKV ordering of a question before data, but I tested the other way around too) and unless the query word is very early in the string it gives me a random word. Even 1.3B transformer models seems to answer this correctly much more often (though not always correctly).