I have a query which costs around 300 tokens, and as 1000 tokens cost 0,06 USD that translates to roughly 0,02 USD for that request.
Let say I would deploy a LocalLLaMA on RunPod, on one of the cheaper machines, would that request be cheaper than running it on GPT4?
Can’t you use ChatGPT 3.5 for free? It would be the cheapest option and would surely beat any 70b model you can find on random websites.