Thanks, this is interesting. This all said, it still looks like B is a much more important factor than quantisation down to Q3, meaning a 20B Q3 is going to write better than a 13B fp16. And such it seemed to me personally but I haven’t done any rigorous testing.
Try to use it for coding, it’ll be as good as offshoring.