@kivathewolf

kivathewolf@alien.top · 1 year ago

I have heard that podcast and I don’t think he explicitly said that the Grok will be open sourced. He said open sourcing models with a 6 month delay is perhaps a good idea (paraphrasing). Yes, I would love to try Grok open source, but right now it’s more closed source than GPT. Only available to some users. I wouldn’t hold my breath for this one.

kivathewolf@alien.top · 1 year ago

While I have not tried this in azure, my understanding is that you can deploy a Linux vm with A100 in azure (T4or V100 may not work for all use cases, but will be a cheaper option). Once you have a Linux vm with GPU, you can choose how you would like to host the model(s). You can write some code and expose the LLM via an API ( I like Fast chat, but there are other options as well). Heck you can even use ooba if you like. Just make sure to check the license for what you use.

kivathewolf@alien.top · 1 year ago

This is really cool! Good choice on starting with the chat model and not the base model. They are much more friendly to alignment with a small dataset. In your post you mention you do QLorA in few mins. I am assuming that’s for a small dataset like <1000 samples? What’s your backend running on? I would love to learn how you are deploying and scaling this for multiple customers. Best of luck!

kivathewolf@alien.top · 1 year ago

Checkout fastChat api. Easy to deploy and you can scale it. It can also support an open AI format api.

kivathewolf@alien.top · 1 year ago

I believe you are using LoRa? How are you training? What library are you using? In my experience (which is limited) many libraries don’t set attention to 1 for the eos token. Thus the model is trained to ignore it. If you use the hugging face trainer library, you need to define your own mapping function in which you set the attention for the eos token to be 1. Make sure your dataset used for training also uses at the end of the response. If you do that, then you probably don’t need to mess with the attention. All these problems go away when you use an instruct model as it’s already trained to stop at the end. If you use the same prompt format in your fine tuning dataset, that will work well.

kivathewolf@alien.top · 1 year ago

Per the tokenizer.config for mistral instruct model the eos is . You can use the same. If you check the tokenizer file for the instruct base model, the is defined as a special token. So it will work fine for eos. Reg padding, the reason you define the padding is so that all your batches are of same fixed length during tuning. Define your dataset with <s> to start and use </s> to eos and pad to right.

Btw, why are you fine tuning the base model for text to sql? Won’t it be better to fine tune the instruct model for this? You can use the same prompt template as the instruct model uses. Good luck and let me know how it goes.

kivathewolf@alien.top · 1 year ago

So has anyone here tried this- train a LoRA adapter with say base Llama2 model and then merge the Lora adapter with say the Wizard model. As the wizard is a llama fine tuned model, will the LoRa weights merge? I might try it later as well :) If this works, then this is a way to solve your problem. As long as the model architecture doesn’t change, your specific adapter should be applicable even if the base model gets “outdated”.

kivathewolf@alien.top · 1 year ago

I like the analogy that Andrej Karpathy posted on X sometime back. LLM OS

Think of LLM as an OS. There are closed source OS like Windows and Mac, and then there are open source OS based on Linux. Each has its place. For most regular consumers, windows and mac are sufficient. However Linux has its place for all kinds of applications (from the Mars rover, to your raspberry pi home automation project). The LLMs may evolve in a similar fashion. For highly specific use cases, it maybe better to use a small LLM fine tuned for your application. In cases where data sovereignty is important, it’s not possible to use open AIs tools. Next, let’s say you have an application where u need an AI service and internet is not available. Local models are the only way you can go about.

It’s also important to understand that when you use GPT4, you aren’t using an LLM, but a full solution, where there’s the LLM, RAG, classic software functions (math), internet browsing and may be even other “expert LLMs”. When you download a model from Hugging face and run it, you are just using one piece of the puzzle. So yes, your results will not be comparable to GPT4. What open source gives you, is the ability to make a system like GPT4, but you need to do the work to get it there.