minus-squarekeklsh@alien.topOPBtoLocalLLaMA@poweruser.forum•How to minimize model inference costs?linkfedilinkEnglisharrow-up1·1 year agoNah, a simple calculation of 3090 ($0.22/hr, and not enough VRAM to run 70b 4bit!) generating at 20t/s puts it at $13.8/million tokens. That’s extremely expensive compared to the API price. linkfedilink
keklsh@alien.topB to LocalLLaMA@poweruser.forumEnglish · 1 year agoHow to minimize model inference costs?plus-squaremessage-squaremessage-square5fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1message-squareHow to minimize model inference costs?plus-squarekeklsh@alien.topB to LocalLLaMA@poweruser.forumEnglish · 1 year agomessage-square5fedilink
Nah, a simple calculation of 3090 ($0.22/hr, and not enough VRAM to run 70b 4bit!) generating at 20t/s puts it at $13.8/million tokens.
That’s extremely expensive compared to the API price.