Hello fellow llamas!!!
Here is what I am hacking on….
I am exploring new ways to build generative AI foundational models without traditional math-centric training costs and resources. I am trying to lower the bar for anyone looking to build and share models that are:
- task-trained - models are trained to do very specific task(s) with only the required datasets (explicitly-overfitting for known use case(s) instead of generalized/underfitting and having to wait to search through the entire internet to respond)
- modular - because the models only know about these smaller, task-trained dataset(s) the models will hopefully be faster at responding than today’s
- device-native - models are targeted for constrained environments that do not have gpu clusters, excess ram/cpu/storage/connectivity
- open source - since the weights are public domain, the derived intelligence should be public domain
- type of foundational model: weight-derived (blog: https://matlok.ai/ docs: https://bampe-weights.readthedocs.io/en/latest/)
I believe there may be some math/stats proofs that are missing (see the smooth-brain), but I want to push this modular/lego block like approach in hopes of reaching parity with a new generation of foundational models. One of my fundamental assumptions is that if I substantially-reduce the training corpus, a smaller/overfit model will hopefully be faster than a traditionally-trained large language model. The initial, slimmer model building process should also hopefully run on IoT devices and plug-in to existing distributed architectures (device-native).
What are you doing next - Initial use case?
I need help with a good initial use case (please let me know if you have better ones!). Current best idea of the week/last 3 days: I believe this approach and knowledge system of assembling weight-derived models should be shared so we can ensure concepts like an “ethical watermark” for Asimov’s Laws of Robotics are always present in all pre-trained AI model weights using cosine similarity searches. As this approach matures, we should be able to audit and report on what these models know, and I think we need a community-driven project to tackle it.
tl;dr
It’s early days, but I believe we can reuse existing AI tensor weights complemented with smaller “fine-tuning”-sized datasets to build small, high-quality fast generative models.
PoC repository:
https://github.com/matlok-ai/bampe-weights
Inputs
Extracted tensor weight from a GPT2 model.safetensors file:
Outputs
Predicted weight-derived file for use in a new type of foundational generative AI model
Thanks for the help, guidance and assistance staying up with the insane speed of this ecosystem!
Reach out if you want more info - my email is in the profile
This sounds absolutely crazy and something that both should never work and also should work and I don’t know how to feel about it, its a interesting idea at leasy