Is anyone experimenting with non-instruction tuned models?

wojcech@alien.top · 1 year ago

Is anyone experimenting with non-instruction tuned models?

wojcech@alien.top · 1 year ago

Just to be clear, you aren’t doing fine tuning here as in gradient updates, you are using the base model + ICL?

phree_radical@alien.top · 1 year ago

Yep, basically like taking a few samples from a dataset and turning them into a short text “document” with an obvious pattern so the LLM will complete it

Few-shot vs fine-tuning comparison:

Pros:

converge behavior with much fewer examples
dynamic. changes to “dataset” applied without modifying model weights
no worry about whether important information is lost
can do things like average logits of single-token classification problems from multiple inferences (work around context length limitations)

Cons:

needs context length, so can’t provide too many examples or too large
sometimes need “adversarial” examples to discourage repetition of text from other examples
models that are too small have worse ICL