I have access to a single 80Gb A100 GPU and would like to train an LLM with GPT-like architecture from scratch. Does anyone know how to calculate the maximum model size.
I have access to a single 80Gb A100 GPU and would like to train an LLM with GPT-like architecture from scratch. Does anyone know how to calculate the maximum model size.
This question might come off as stupid, but it’s really something I’m curious about:
I 100% see why someone would like to take a state-of-the-art current open model and fine-tune it on their own data. I don’t see why someone would want to train their own model from scratch. Can you explain it?