[D] The Status of Open Source Code LLMs

RealAGIFan@alien.top · 1 year ago

[D] The Status of Open Source Code LLMs

Disastrous_Elk_6375@alien.top · 1 year ago

This suggests that judging a model based on a single benchmark might not provide the full picture.

Duh… This has been a recurring problem with all these “benchmark leaderboards”. It turns out that “training on the testing set is all you need”…

Caffeine_Monster@alien.top · 1 year ago

I haven’t got round to trying the xwin coder models, but the precursor 70b chat model was extremely impressive when compared against both chat GPT 3.5 and 4.

koolaidman123@alien.top · 1 year ago

If you look at something like evolinstruct data its so similar to humane al itd be a surprise if models trained on that data (or other synthetic data) dont perform well

As a rule of thumb i only generally trust base models (even then its iffy) on benchmarks and for finetuned models only by using it