I believe these are TheBloke’s GGUF quants if anyone’s interested: https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF
Also note this important issue that affects this and all other Yi-based models:
So we can just skip BOS token on all these models?
I did the gguf-py/scripts/gguf-set-metadata.py some-yi-model.gguf tokenizer.ggml.bos_token_id 144
and it’s changed the outputs a lot from yesterday.
Can’t wait to see the benchmarks on these things.
Dang, after that 34b drought it’s like suddenly stumbling onto the great lakes right now.
200K context!!
Precisely 47K fits in 24GB at 4bpw.
I have not tried 3.5, but I think it could be much more.
Based on the 200K Context Yi 34B.
if it is based in yi, should it not have the yi-licence instead of mit?
Yes.
But its ML land! Everyone violates licenses anyway :P