@CKtalon

CKtalon@alien.top · 1 year ago

Although for inferencing, memory bandwidth is the most important, FLOPS still matter. APUs are just too slow, so the bottleneck will get shifted to calculating all those matrix operations (provided there’s high bandwidth designed for APUS like Apple which I doubt so)

CKtalon@alien.top · 1 year ago

You are probably talking about fine tuning then (pre)training a model. There are models that were trained for coding like codellama and all the variants. You could probably train on the library’s code but I doubt you will get much out of it. Perhaps the best way is to create some instruction data based on the library (either manually or synthetic) and fine tune on that.

CKtalon@alien.top · 1 year ago

No one has figured out the plateau yet as more data = longer training = more expensive. Currently it seems like you can keep training with more data. Companies are pretty much training on ‘all of the internet’ data to get the LLM ‘cleverness’. Not just Shakespeare.

About deciding the size of the model, there is the Chinchilla scaling law which provides the compute optimal point given a compute budget, ie. 2T on a 7b vs 0.5T on a 13B, the former would be better (made up number). There is also the consideration of the costs of serving the model together with the training cost and the accuracy required.

CKtalon@alien.top · 1 year ago

Does your motherboard even have 8 PCIe slots? You’ll be needing server boards. Even workstation boards typically support only 7 (and can’t fit them all due to size)

CKtalon@alien.top · 1 year ago

Once you use one of the significance tests, you’ll start seeing that increased parameters don’t give a significant improvement given the increased number of parameters, but we are at a point where accuracy is more important than that.