MrTacobeans@alien.topBtoLocalLLaMA@poweruser.forum•NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMEnglish
1·
1 year agoI don’t think that will come from Nvidia. It’s going to take in memory compute to get anywhere near that level of efficiency. First samples of these SOCs are no where near the memory requirements needed even for small models. These type of accelators will likely come from Intel/arm/risc/amd before Nvidia does it.
Just even the body of this post smelled like a very manicured AI version of whatever was initially written. To me it feels like fraud in the highest degree. The website being basically unusable with gray text on a white background was enough to think of this post as complete meh.
The flood of llms is already happening your models aren’t beating Mistral or yi models. Infusing a persona into a LLM is borderline trivial at this point. What are you adding to the sea of models?