It’s a source. But rarely synthetic benchmarks give you the whole picture. Plus those test sets are in the public, so there is some incentive for some people to game the system (and even without that those data sets most likely are already in the training data).
I’ve had the same experience. Are you using GGUF? I do, and I’ve heard that Yi may suffer from GGUF. So EXL2 might be better… I need to try it and see.
It’s a source. But rarely synthetic benchmarks give you the whole picture. Plus those test sets are in the public, so there is some incentive for some people to game the system (and even without that those data sets most likely are already in the training data).
I’ve had the same experience. Are you using GGUF? I do, and I’ve heard that Yi may suffer from GGUF. So EXL2 might be better… I need to try it and see.