With high-end Android phones now packing upwards of 24GB of RAM, I think there’s huge potential for an app like this. It would be amazing to have something as powerful as the future Mistral 13B model running natively on smartphones!

You could interact with it privately without an internet connection. The convenience and capabilities would be incredible.

    • Winter_Tension5432@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Smaller models are the future of smartphones, everyone’s will be running 10b models on their phones by 2025 this are more than enough for creating emails and translations and just asking questions, a lot more useful than siri and alexa.

  • SlowSmarts@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The direction I took was to start making a Kivy app that connects to an LLM API at home via OpenVPN. I have Ooba and LLama.cpp API servers that I can point the android app to. So, works on old or new phones and is the speed of the server.

    The downsides are, you have to have a static IP address or DDNS to connect a VPN to. And cell reception can cause issues.

    I have a static to my house, but a person could have the API server be in the cloud with a static IP, if you were to do things similarly.

    • Winter_Tension5432@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      A normal person would not be able to do it, the first people that create a oogaboga app for android and iPhone and place it on the store at 15$ will have my money for sure and probably from a million other people too.

  • MrOogaBoga@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Why isn’t anyone building an Oogabooga-like app

    you spoke the sacred words so here i am

    • Winter_Tension5432@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I am dreaming with a S24 ultra with a app that let me run a hypothetical future mistral 13b running at 15 tokens/sec with tts, someone can dream.

  • a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Apple is literally doing this stuff with their ML framework built into devices… but for tool applications, not a chatbot.

  • BlackSheepWI@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s a lot of work. Phones use a different OS and a different processor instruction set. The latter can be a big pain, especially if you’re really dependant on low-level optimizations.

    I also feel that -most- people who would choose a phone over PC for this kind of thing would rather just use a high quality easily-accessible commercial option (chatGPT, etc) instead of a homebrew option that required some work to get running. So demand for such a thing is pretty low.

    • Winter_Tension5432@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I’m not so sure, chatgpt has privacy issues and a small model but completely uncensored it has value too. There is a market for this. Convenient and privacy.

  • Nixellion@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Check Ollama, they have links on their GitHub page to stuff using it, and they have an android app that I believe runs locally on the phone. It uses llama.cpp

  • _Lee_B_@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s not just RAM, you also need the processing power. Phones can’t do *good* LLMs yet.

    If you watch the chatGPT voice chat mode closely on android, what it does is listen, with a local voice model (whisper.cpp), and then answers generally/quickly LOCALLY, for the first response/paragraph. While that’s happening, it’s sending what you asked to the servers, where the real text processing takes place. By the time your phone has run the simple local model and gotten a simple sentence for the first response and read that to you, it has MOSTLY gotten the full paragraphs of text back from the server and can read that. Even then, you still notice a slight delay.