Following the release of Dimensity 9300 and S8G3 phones, I am expecting growth in popularity of LLMs running on mobile phones, as quantized 3B or 7B models can already run on high-end phones from five years ago or later. But despite it being possible, there are a few concerns, including power consumption and storage size. I’ve seen posts about successfully running LLMs on mobile devices, but seldom see people discussing about future trends. What are your thoughts?
I know the question here is about running LLMs on mobile, but that’s building in too many assumptions I think.
The future of LLM technology is as follows
This loop is going to get faster and faster, and once its generally accessible, you’re no longer concerned with what LLMs run on you’re phone, you’re instead concerned with which specific subtasks can be designed to run on your phone and how to assemble them into your application’s specific needs. At the end of the day, you are not going to need to ask an AGI to fill out API calls.