I wouldn’t even mind, loading in a 23B model takes 8 seconds and then generates at 28t/s using a 3090ti. Lower parameters go even faster.
Thats heaps faster than loading in a 70B model, and it taking 6 minutes to generate a single reply at 1.1t/s for something that might not even be correct because you didn’t prompt it right.
I wouldn’t even mind, loading in a 23B model takes 8 seconds and then generates at 28t/s using a 3090ti. Lower parameters go even faster.
Thats heaps faster than loading in a 70B model, and it taking 6 minutes to generate a single reply at 1.1t/s for something that might not even be correct because you didn’t prompt it right.