Introducing Tess: Tess-M with 200K Context Length

migtissera@alien.top · 2 years ago

Introducing Tess: Tess-M with 200K Context Length

PMMeYourWorstThought@alien.top · 2 years ago

Fuck Yi and it’s license model.

Tiny_Arugula_5648@alien.top · 2 years ago

What’s the VRAM usage? a context that big can use an enormous amount…

migtissera@alien.top · 2 years ago

Just on another note, this place is just super hostile! I didn’t think it would be, considering it’s the LocalLLaMA sub-reddit and we are all here to support open source or freely available models.

This is harsher than the Twitter mob!

I’ll still release models, but sorry guys, not coming here again.

llama_in_sunglasses@alien.top · 2 years ago

Sorry to hear that. This thread is pretty wild, almost every other model thread on LocalLlama has at most a few crazies and they get downvoted. Your Synthia models are fairly popular, so the reactions you got seems pretty out of place to me.

Creative_Bottle_3225@alien.top · 2 years ago

do you have to download 71GB to try it?! :-)

CasimirsBlake@alien.top · 2 years ago

Tell me I’m going to need another GPU without telling me I’m going to need another GPU… Eeek.

Sabin_Stargem@alien.top · 2 years ago

When I built my gaming rig, I thought that I wouldn’t need to update for several years. Then a AI came along and kicked my sandcastle into the surf.

My wallet is unhappy, and has already lost inches from the diet it has been put on.

IxinDow@alien.top · 2 years ago

How many tokens in your substack example?
Do you have examples of using model for fiction with length 16K-40K tokens?

llama_in_sunglasses@alien.top · 2 years ago

Thanks for the model, it’s really nice to have some synthia magic on a Yi-34B 200K base.

Part of the generation from your suggested prompt:

The magnetic field of our planet is generated by an iron-nickel core that rotates like a dynamo, creating electric currents which in turn produce the magnetic force we experience as compass needles pointing northward when held still relative to this field’s direction over time periods measured in years rather than seconds or minutes because it varies slightly due to solar wind interactions with upper layers known collectively as “ionosphere.”

I found this particular output unintentionally hilarious because it reminds me a lot of the reddit comments I type out then delete because it’s just some overexplainy run-on gibberish.

pseudonerv@alien.top · 2 years ago

I thought I saw a Tess-XL but it’s gone, now. What happened?

ReMeDyIII@alien.top · 2 years ago

According to TheBloke the Sequence Length is 8192 ctx, so I’m assuming 8192 ctx is its default and it can extend up to 200k ctx via alpha_scale?

migtissera@alien.top · 2 years ago

No, the base model itself is 200K: https://huggingface.co/01-ai/Yi-34B-200K

mcmoose1900@alien.top · 2 years ago

Almost the same syntax as Yi Capybara. Excellent.

I propose all Yi 34B 200K finetunes use Vincuna-ish prompt syntax, so they can ALL be merged into one hellish voltron model.

mcmoose1900@alien.top · 2 years ago

The deed is done:

https://huggingface.co/brucethemoose/Capybara-Tess-Yi-34B-200K

Seems coherent in transformers, I’m gonna quant it to exl2 and test it out.

SomeOddCodeGuy@alien.top · 2 years ago

Just wanted to come back and let you know I started using this last night, and this is fantastic. I haven’t put it through much testing yet, but just know that on initial use I’m very impressed by this model for general purpose AI assistant. It’s keeping to the Assistant’s more informal speech patterns while also answering questions well and keeping up with large context. Those are 3 checkboxes I’ve never been able to check at once. This praise wont’ get much visibility since it’s an older thread, but just wanted to let you know at least.

YearZero@alien.top · 2 years ago

Testing it now, but it’s worse than 7b models on logic questions for me. Huge disappointment compared to Dolphin and Nous-Capybara, both Yi finetunes and are the best models I’ve tested so far. It just goes to show you how much difference finetuning a base model can make.

drifter_VR@alien.top · 2 years ago

Nice, did you manage to make a difference between Dolphin and Nous-Capybara ? Bothe are pretty close to me

YearZero@alien.top · 2 years ago

Nope they’re both really good and very close to each other in my tests: https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit?usp=sharing&ouid=102314596465921370523&rtpof=true&sd=true

drifter_VR@alien.top · 2 years ago

Thanks, I remember your tests, it’s great you are still on it.So according to your tests, 34b models compete with GPT3.5. I am not too surprised. And Mistral-7b is not so far behind, what a beast !
Will you benchmark 70b models too ?

YearZero@alien.top · 2 years ago

Unfortunately I don’t have enough ram/gpu, and too broke right now to afford paying for extra! But in the future I hope I will

mcmoose1900@alien.top · 2 years ago

More random feedback: you should put some combination of Yi, 34B, and or 200K in the title.

No one tags anything on HF, so the only way to browse models is by title. I would have totally missed this in my Yi/34B searches if not for the Reddit post.

Sabin_Stargem@alien.top · 2 years ago

Yeah, it was only by luck that I stumbled onto this. Something like “Yi-34b-200k - Tess Medium” would work better.

f1kkz@alien.top · 2 years ago

500k context next? This is hilarious 😂

sophosympatheia@alien.top · 2 years ago

This model kicks ass. I strongly recommend trying it for roleplay. The 4-bit 32g act order GPTQ quant is on par with 70b models, so I can only imagine what higher-bit quants can do.

BangkokPadang@alien.top · 2 years ago

What makes this any different than the “base” Yi-34B-200k model?

Where can we see a description of what the model has been finetuned on (datasets used, Lora’s used, etc.) and/or your methods for doing so? I’m not finding any of this information in the model card or the substack link.

Slimxshadyx@alien.top · 2 years ago

I’m not sure why he is being very vague with this model. He said it’s fine tuned to be better at instruct? I think