Has Anyone Successfully Utilized the Neural Networks API on Android for LLMS with EdgeTPU?

dewijones92@alien.top · 1 year ago

Has Anyone Successfully Utilized the Neural Networks API on Android for LLMS with EdgeTPU?

GlobalRevolution@alien.top · 1 year ago

Yup, it definitely will help speed up inference on models you can get working.

My personal recommendation is to start with something like PyTorch Mobile or Tensorflow Lite (whichever you prefer). The main benefit is that you can take a model in PyTorch and compile it down to a representation that will use the NN API

You can pretty quickly use the examples in this repo to try out running a language model like BERT. It will also show you the process of converting a model and running it in your phone.

https://github.com/pytorch/android-demo-app

If you’re going after maximum performance on a particular model then it might make more sense to learn the NN API directly try to build it yourself. Personally I would probably try to work with the open source community to add an NN API backend in llama.cpp

https://github.com/ggerganov/llama.cpp/issues/2687