Point me towards some basic dataset preparation tips for LLM's?

ArtifartX@alien.top · 2 years ago

Point me towards some basic dataset preparation tips for LLM's?

__SlimeQ__@alien.top · 2 years ago

if you’re making a lora, training on wikipedia directly will pretty much make it output text that looks like wikipedia. which is to say it will (probably) be worse at chatting.

a strategy i’ve been using lately is to get gpt4 to make a conversation in my chosen format *about* each chapter of my “textbook”, i can automate this with pretty good results and it’s done in about 10 minutes. It does kind of work, it’ll at least get the bot to talk about the topics I chose, but as far as actually comprehending the information it’s referencing… it’s bad. It gets better as I increase rank, but it takes a lot of VRAM. I can only get to around 256 before it’ll die

Hey_You_Asked@alien.top · 2 years ago

please share!!