I know the question here is about running LLMs on mobile, but that’s building in too many assumptions I think.
The future of LLM technology is as follows
Large models learn to do a new task
Specific tasks get broken down into foundational sub tasks
foundational subtasks are distilled into memoized code, hardcoded transformers and traditional code.
you no longer use a Large model for that subtask, instead you use a highly specialized module that fits on a toaster.
This loop is going to get faster and faster, and once its generally accessible, you’re no longer concerned with what LLMs run on you’re phone, you’re instead concerned with which specific subtasks can be designed to run on your phone and how to assemble them into your application’s specific needs. At the end of the day, you are not going to need to ask an AGI to fill out API calls.
I know the question here is about running LLMs on mobile, but that’s building in too many assumptions I think.
The future of LLM technology is as follows
This loop is going to get faster and faster, and once its generally accessible, you’re no longer concerned with what LLMs run on you’re phone, you’re instead concerned with which specific subtasks can be designed to run on your phone and how to assemble them into your application’s specific needs. At the end of the day, you are not going to need to ask an AGI to fill out API calls.