• 1 Post
  • 18 Comments
Joined 11 months ago
cake
Cake day: October 30th, 2023

help-circle


  • Let me start off by saying I haven’t gotten this right yet. But still…

    When autogpt went viral, the thing everyone started talking about was vector db’s and how they can magically extend the context window. This was not a very informed idea and implementations have been lacking.

    It turns out that merely finding similar messages in the history and dumping them into the context is not enough. While this may sometimes give you a valuable nugget, most of the time it will just fill the context with repetitive garbage.

    What you really need for this to work imo, is a structured narrative around finding the data, reading it, and reporting the data. LLMs respond extremely poorly to random, disconnected dialogue. They don’t know what to do with it. So for one thing, you’ll need a reasonable amount of pre-context for each data point so that the bot can even understand what’s being talked about. But now this is prohibitively long, 4 or 5 matches on your search and your context is probably full. So you’ll need to do some summarizing before squeezing it into the live conversation, which means your request takes 2x longer, at a minimum, and then you need to weave that into your chat context in as natural a way as possible.

    Honestly RAG as a task is so weird that I no longer expect any general models to be capable of it. Especially not 7B/13B. Even gpt4 can just barely do it. I think with a very clever dataset somebody could make an effective RAG Lora, but I’ve yet to see it.







  • my personal lora does this just because it was trained on actual human conversations. it’s super unnatural for people to try answering just any off the wall question, most people will just go like “lmao” or “idk, wtf” and if you methodically strip that from the data (like most instruct datasets do) then it makes the bots act weird as hell





  • Mistral and Llama2 (and Llama) are foundation models, meaning they actually trained all the weights given. Almost anything worth using is a derivative of these 3 foundation models. They are really expensive to train.

    Just about everything else is a Lora fine tune on top of one of them. Fine tunes only change a small fraction of the weights, like 1%. Functionally speaking, the important part of these is the additional data they were trained on, and that training can be done on any underlying model.

    So Open hermes is a Lora tuning on top of mistral, and is some opensource offshoot of nous hermes, which is an instruction dataset for giving good smart answers (or something) in a given instruction format.



  • i’ve been having some success just asking gpt4 to write a quiz over each chapter,and then write a scenario in which my character answers those questions. this was pretty easy to automate and costs a few dollars per book basically. I had to use gpt-3.5-16k for a few very long chapters, and the quality is very much worse.


  • if you’re making a lora, training on wikipedia directly will pretty much make it output text that looks like wikipedia. which is to say it will (probably) be worse at chatting.

    a strategy i’ve been using lately is to get gpt4 to make a conversation in my chosen format *about* each chapter of my “textbook”, i can automate this with pretty good results and it’s done in about 10 minutes. It does kind of work, it’ll at least get the bot to talk about the topics I chose, but as far as actually comprehending the information it’s referencing… it’s bad. It gets better as I increase rank, but it takes a lot of VRAM. I can only get to around 256 before it’ll die