I’m blown away. See for yourself.
https://migel.substack.com/p/a-conversation-with-tess
Tess, welcome to the world!
Model is Open Source with 200K context length.
Available at: https://huggingface.co/migtissera/Tess-M-v1.0
Fuck Yi and it’s license model.
What’s the VRAM usage? a context that big can use an enormous amount…
Just on another note, this place is just super hostile! I didn’t think it would be, considering it’s the LocalLLaMA sub-reddit and we are all here to support open source or freely available models.
This is harsher than the Twitter mob!
I’ll still release models, but sorry guys, not coming here again.
Sorry to hear that. This thread is pretty wild, almost every other model thread on LocalLlama has at most a few crazies and they get downvoted. Your Synthia models are fairly popular, so the reactions you got seems pretty out of place to me.
do you have to download 71GB to try it?! :-)
Tell me I’m going to need another GPU without telling me I’m going to need another GPU… Eeek.
When I built my gaming rig, I thought that I wouldn’t need to update for several years. Then a AI came along and kicked my sandcastle into the surf.
My wallet is unhappy, and has already lost inches from the diet it has been put on.
How many tokens in your substack example?
Do you have examples of using model for fiction with length 16K-40K tokens?Thanks for the model, it’s really nice to have some synthia magic on a Yi-34B 200K base.
Part of the generation from your suggested prompt:
The magnetic field of our planet is generated by an iron-nickel core that rotates like a dynamo, creating electric currents which in turn produce the magnetic force we experience as compass needles pointing northward when held still relative to this field’s direction over time periods measured in years rather than seconds or minutes because it varies slightly due to solar wind interactions with upper layers known collectively as “ionosphere.”
I found this particular output unintentionally hilarious because it reminds me a lot of the reddit comments I type out then delete because it’s just some overexplainy run-on gibberish.
I thought I saw a Tess-XL but it’s gone, now. What happened?
According to TheBloke the Sequence Length is 8192 ctx, so I’m assuming 8192 ctx is its default and it can extend up to 200k ctx via alpha_scale?
No, the base model itself is 200K: https://huggingface.co/01-ai/Yi-34B-200K
Almost the same syntax as Yi Capybara. Excellent.
I propose all Yi 34B 200K finetunes use Vincuna-ish prompt syntax, so they can ALL be merged into one hellish voltron model.
The deed is done:
https://huggingface.co/brucethemoose/Capybara-Tess-Yi-34B-200K
Seems coherent in transformers, I’m gonna quant it to exl2 and test it out.
Just wanted to come back and let you know I started using this last night, and this is fantastic. I haven’t put it through much testing yet, but just know that on initial use I’m very impressed by this model for general purpose AI assistant. It’s keeping to the Assistant’s more informal speech patterns while also answering questions well and keeping up with large context. Those are 3 checkboxes I’ve never been able to check at once. This praise wont’ get much visibility since it’s an older thread, but just wanted to let you know at least.
Testing it now, but it’s worse than 7b models on logic questions for me. Huge disappointment compared to Dolphin and Nous-Capybara, both Yi finetunes and are the best models I’ve tested so far. It just goes to show you how much difference finetuning a base model can make.
Nice, did you manage to make a difference between Dolphin and Nous-Capybara ? Bothe are pretty close to me
Nope they’re both really good and very close to each other in my tests: https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit?usp=sharing&ouid=102314596465921370523&rtpof=true&sd=true
Thanks, I remember your tests, it’s great you are still on it.So according to your tests, 34b models compete with GPT3.5. I am not too surprised. And Mistral-7b is not so far behind, what a beast !
Will you benchmark 70b models too ?Unfortunately I don’t have enough ram/gpu, and too broke right now to afford paying for extra! But in the future I hope I will
More random feedback: you should put some combination of Yi, 34B, and or 200K in the title.
No one tags anything on HF, so the only way to browse models is by title. I would have totally missed this in my Yi/34B searches if not for the Reddit post.
Yeah, it was only by luck that I stumbled onto this. Something like “Yi-34b-200k - Tess Medium” would work better.
500k context next? This is hilarious 😂
This model kicks ass. I strongly recommend trying it for roleplay. The 4-bit 32g act order GPTQ quant is on par with 70b models, so I can only imagine what higher-bit quants can do.
What makes this any different than the “base” Yi-34B-200k model?
Where can we see a description of what the model has been finetuned on (datasets used, Lora’s used, etc.) and/or your methods for doing so? I’m not finding any of this information in the model card or the substack link.
I’m not sure why he is being very vague with this model. He said it’s fine tuned to be better at instruct? I think