Interview with CEO of Mistral (Arthur Mensch)

DreamGenX@alien.top · 2 years ago

Found a live stream on YouTube, for anyone interested: https://www.youtube.com/watch?v=o35EY8I9PXU

DreamGenX@alien.top · 2 years ago

Interview with CEO of Mistral (Arthur Mensch)

DreamGenX@alien.top · 2 years ago

On top of what other said, make sure to include a few shot examples in your prompt, and consider using constrained decoding (ensuring you get valid json of whatever schema you provide, see pointers on how to do it with llama.cpp).

For few shotting chat models, append fake previous turns, like:

System: 
User: 
Assistant: 
...
User: 
Assistant: 
User:

DreamGenX@alien.top · 2 years ago

It’s inevitable people will game the system when it’s so easy, and the payoff can be huge. Not so long ago people could still get huge VC checks for showing off GitHub stars or benchmark numbers.

DreamGenX@alien.top · 2 years ago

DreamGen Opus 70B — Uncensored model for story telling and chat / roleplay

DreamGenX@alien.top · 2 years ago

Curious to hear what other UIs people use and for what purpose / what they like about each (like Oogabooga, or Kobold).

DreamGenX@alien.top · 2 years ago

I can recommend vLLM. Also offers OpenAI compatible API service, if you want that.

DreamGenX@alien.top · 2 years ago

Thank you so much for the kind feedback! If you have found some cool prompts, come share them with others on our discord.

DreamGenX@alien.top · 2 years ago

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

DreamGenX@alien.top · 2 years ago

I hope it will be something tasty! :)

DreamGenX@alien.top · 2 years ago

The training data had example of up to 4096 tokens. The model should also work beyond that, but I did not do a deep analysis of degradation.

DreamGenX@alien.top · 2 years ago

It’s here!

DreamGenX@alien.top · 2 years ago

I agree, I hope I can make things cheaper with better utilization. You have to consider that a single GPU is not used 100% the time, so there’s a lot of waste. And due to lack of scale, I also do not get any special pricing on the GPUs. The more users, the closer the utilization will be to 100%, and the better GPU pricing. (For instance, I heard that on Google Cloud, enterprise customers can negotiate the on-demand GPU price down to the regular spot price for some of the GPUs)

DreamGenX@alien.top · 2 years ago

Wow, amazing, thanks for giving it a try GGUF and other quants are coming, so your computer should have an easier time soon! :)

What’s the maximum possible dead babies score? :D

DreamGenX@alien.top · 2 years ago

Thank you!

DreamGenX@alien.top · 2 years ago

Great news, the great /u/TheBloke is working on this!

https://preview.redd.it/uqebzbr1q8zb1.png?width=2175&format=png&auto=webp&s=46ab334fa4b2b3cabab7d36461f991edfd2e8a60

DreamGenX@alien.top · 2 years ago

I have been using the Python API client 1.0 preview version (which was just released) for some time with vLLM OpenAI compatible server and it worked well – at least I did not notice any issues.

DreamGenX@alien.top · 2 years ago

There was a bug on the website where the first time the “Continue” would not work if you did not refresh, should work now even though the editor is quite janky still, sorry for that :(

(can’t wait for AI to take over React from me :P)

DreamGenX@alien.top · 2 years ago

DreamGen Opus — Uncensored model for story telling and chat / RP