With high-end Android phones now packing upwards of 24GB of RAM, I think there’s huge potential for an app like this. It would be amazing to have something as powerful as the future Mistral 13B model running natively on smartphones!
You could interact with it privately without an internet connection. The convenience and capabilities would be incredible.
People are always building, but the smaller models are kinda pointless
Smaller models are the future of smartphones, everyone’s will be running 10b models on their phones by 2025 this are more than enough for creating emails and translations and just asking questions, a lot more useful than siri and alexa.
Well, I’ve just tested a few models for my workflows and found out only 70B cuts it.
For now, but you will have 13b models as good as 70b models by the end of next year.
The direction I took was to start making a Kivy app that connects to an LLM API at home via OpenVPN. I have Ooba and LLama.cpp API servers that I can point the android app to. So, works on old or new phones and is the speed of the server.
The downsides are, you have to have a static IP address or DDNS to connect a VPN to. And cell reception can cause issues.
I have a static to my house, but a person could have the API server be in the cloud with a static IP, if you were to do things similarly.
A normal person would not be able to do it, the first people that create a oogaboga app for android and iPhone and place it on the store at 15$ will have my money for sure and probably from a million other people too.
🤔 hmmm… I have some ideas to test…
Why isn’t anyone building an Oogabooga-like app
you spoke the sacred words so here i am
I am dreaming with a S24 ultra with a app that let me run a hypothetical future mistral 13b running at 15 tokens/sec with tts, someone can dream.
Apple is literally doing this stuff with their ML framework built into devices… but for tool applications, not a chatbot.
It’s a lot of work. Phones use a different OS and a different processor instruction set. The latter can be a big pain, especially if you’re really dependant on low-level optimizations.
I also feel that -most- people who would choose a phone over PC for this kind of thing would rather just use a high quality easily-accessible commercial option (chatGPT, etc) instead of a homebrew option that required some work to get running. So demand for such a thing is pretty low.
I’m not so sure, chatgpt has privacy issues and a small model but completely uncensored it has value too. There is a market for this. Convenient and privacy.
Check Ollama, they have links on their GitHub page to stuff using it, and they have an android app that I believe runs locally on the phone. It uses llama.cpp
It’s not just RAM, you also need the processing power. Phones can’t do *good* LLMs yet.
If you watch the chatGPT voice chat mode closely on android, what it does is listen, with a local voice model (whisper.cpp), and then answers generally/quickly LOCALLY, for the first response/paragraph. While that’s happening, it’s sending what you asked to the servers, where the real text processing takes place. By the time your phone has run the simple local model and gotten a simple sentence for the first response and read that to you, it has MOSTLY gotten the full paragraphs of text back from the server and can read that. Even then, you still notice a slight delay.