unless you’re doing this as a business it’s going to be massively cost prohibitive, hundreds of thousands dollars of hardware. If it is a business you better get talking to cloud vendors because GPUs are an incredibly scarce resource right now.
no absolutely not… not how you described it. the issue isn’t about RAM it’s about the numbers of calculations that need to be done. With GPUs you need to load the data into VRAM and that is only going to be available for that GPUs calculations it’s not a shared memory pool. So load data into the p40 it will only be able to use that for it’s calculations.
Yes you can run the model on multiple GPUs. If one of those is very slow with lots of RAM then the layers you offload to that card will be processed slowly. No there is no way to speed up calculations. VRAM is only making the weights readily available so you’re not constantly loading and unloading the model weights.
Looks like there are already have issues… it should be using LLMs to automate moderation…
What’s the VRAM usage? a context that big can use an enormous amount…
Their needle in a haystack test isn’t very compelling. Sure no test is flawless but a random out of context fact placed at different points in the context window there is a lot of reasons why the model would fail to retrieve that.
Go to huggingface and look at the multitude of datsets that have already been prepped and read whatever documentation and papers that have been published. Go through the data and get a sense of what the data looks like and how it’s structured.
Textai is fantastic!!
You should also check out the Coral TPU boards, they are more efficient for models.
one of those cases where proving something can be done doesn’t make it useful. This has to be one of the least efficient ways to do inferencing. Like the people who got Doom running on a HP printer. Great you did it but it’s the worst possible version.