Asking for tips how to use base models instead of instruct/chat tuned models

noeda@alien.top · 2 years ago

Asking for tips how to use base models instead of instruct/chat tuned models

kindacognizant@alien.top · 2 years ago

What kind of sampler settings are you using? You can force models to get really out there in terms of creativity depending on what you use.

__SlimeQ__@alien.top · 2 years ago

If you’re not using a chat/instruct tuned model you should be using the notebook, the input that the chat tab creates will be chat/instruct formatted

noeda@alien.top · 2 years ago

I always use the Raw tab, even when chatting (I look up the template manually if I’m using it chat-way). I like to see exactly what is given to the model and what it generates back. Sometimes I use command line software when I’m not using the UI.

phree_radical@alien.top · 2 years ago

I think base model is preferable in many cases for for developers, particularly if instruction-following abilities don’t cut it, or you worry about instruction injection, or just want to make sure the text you get isn’t bent into the curves of the “helpful” fine-tuning distribution

It’s easy to recommend base model for targeted generations that leverage the pattern-following ability. You get what you want after a number of examples, almost like fine-tuning examples. I went through my history for examples of few-shot completion: classification, rewrite sentence copying style, classify, basic Q&A example, fact check yes/no, rewrite copying style and sentiment, extract list of musicians, classify user intent, tool choice, rewrite copying style again, flag/filter objectionable content, detect subject changes, classify profession, extract customer feedback into json, write using specified words, few-shot cheese information, answer questions from context, classify sentiment w/ probabilities, summarize, replace X in conversation

Most of that is aimed at developers, though, and with many use-cases necessitating using temperature of 0

For long-form writing, on the other hand, you’ve found some hindrances. First, results will benefit a great deal from longer context. Second, you’ll probably get some looping patterns you can avoid by increasing repetition penalties in your generator

Finetunes for storywriting do seem like a good idea, I found at least this one

Capital-Alps5626@alien.top · 2 years ago

https://www.reddit.com/r/LocalLLaMA/comments/17yxoxv/local_llm_for_hot_dog_or_not_hot_dog_kind_of_fact/

Would you say your advice in this post is applicable to my post? I think I’m in this same camp. I don’t want to go through the hundreds of fine-tuned models. I just want to talk to the model with the kinds of things you’ve mentioned.

Then why do people fine-tune for instruction? Perhaps the answer to my question is how do you fine tune a model for instruction? Is there a document or steps?

AutomataManifold@alien.top · 2 years ago

That’s a good point about few-shot prompting: the big thing about GPT-3 and instruction training was that it allowed for zero-shot prompting (i.e., prompting with zero examples). But if we’re manually prompting a base model, there’s no reason not to provide those examples, and you get dramatically improved performance versus the same model with no examples.

a_beautiful_rhind@alien.top · 2 years ago

You can use them more for completion. At least you’re supposed to. Sometimes they work in instruct modes like alpaca anyway but will give extra outputs or not follow directions.

FullOf_Bad_Ideas@alien.top · 2 years ago

Yi-34b and Llama 2 70B in my opinion are pretty bad in raw state. Llama 1 65B is pretty good raw. Llama 2 models are not actually raw bases, they clearly recognize instruction prompts and have refusals ingrained, it’s not really a base model. I am not aware of any non-instruct storywriting fine-tunes, but this sounds exciting. If I can find some small storywriting dataset, i can try to train yi-34B or mistral on it.

Base Yi-34B and Mistral get into repetitive patterns fast, llama 65b sometimes start outputting python code out of nowhere, but it should be your best bet for raw storywriting model.

Inevitable-Highway85@alien.top · 2 years ago

Temperature , top_p, and penalty. Take notes as you tune them. Take a raw model , get the dataset structure and fine tune them. I have a 7b model that can comunicarte almost exactly like me Using the recipe above.

AnomalyNexus@alien.top · 2 years ago

tbh I think it’ll be a bad trade off. What you lose in steerability is huge, while I’m not convinced you’ll get any gains on less boring/overly positive. An instruction model you can at least tell it to write something dystopian.

more interesting outputs.

Try jacking up the temperature