• 1 Post
  • 10 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle
  • As I understand LLMs basically write the average pattern of a billion books, so when you add gpt-4 and 3.5 data into the mix, which averages the average, things get boring very fast. For model suggestion, Yi-34b based ones look fine for literary purposes.

    I think being very specific and editing (co-writing with the model) could help. Some LoRA training on specific books could be helpful to mimic a certain style.

    High temperature and repetition penalty could help too.



  • Really cool, will check the video out. Since we found an actually qualified person though, let me ask a few layman questions, hope you have time to answer them!

    sampling methods. Most of them look simple, but we still don’t really know how to tune them. Do you think novel sampling methods or specific combinations could improve output quality by a lot?

    For instance, beam search. Does beam search provide a linear improvement in quality as you go up or not?

    Do you think ideal numbers for temperature, top_k and top_p are context or model based, or both?