Honato2@alien.topBtoLocalLLaMA@poweruser.forum•What percent of your usage of LLMs are closed-source ones (GPT, Claude, etc.) and what percent are open source ones (Llama, Mistral, etc.)? Pick the answer that's closest to you.English
1·
1 year agoUnless I have a python script idea I’m using open source. too many bugs in the scripts so doing it locally is just a pain in the ass.
The leaderboards are pretty much useless. trickery and training for the leaderboard kinda ruins the whole point of it.
First I have the model do some weird rp shit. namely impersonating the macho man randy savage and cutting a promo on a random subject. If it does well it gets 1 point. If it fails then -3 points.
Next is trying for a conversation with the same scoring system. If it stays coherent it passes. bonus points if it keeps character the entire time.
Lastly some simple coding things. If it works out of the box 3 points if it needs endless bug fixing -5.
With points scattered in or taken away arbitrarily based on a whim.
impersonation and cutting promos is pretty effective with the bonus perk of who the fuck would ever train a model to pass that test? It’s a benchmark that is random enough to be possible and not trained to do. Also it’s pretty entertaining usually.