If Nvidia isn’t upgrading GPU’s past 24GB for the RTX 50 series then that will probably factor into the open source community keeping models below 40b parameters. I don’t know the exact cutoff point. A lot of people with 12gb VRAM can run 13b models but you could also run 7b 8-bit with 16k context size. It will get increasingly difficult to run larger contexts with larger models.
Some larger open models are being released but there won’t be much community there to train on a bunch of datasets to the huge models to nail the ideal finetune.
That’s v3-1, there’s already v3-2
https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-7B-v3-2-7B-GGUF
https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-7B-v3-2-7B-AWQ
They were added 11 hours ago