An Alternative Approach to Building Generative AI Models

Hello fellow llamas!!!

Here is what I am hacking on….

I am exploring new ways to build generative AI foundational models without traditional math-centric training costs and resources. I am trying to lower the bar for anyone looking to build and share models that are:

- task-trained - models are trained to do very specific task(s) with only the required datasets (explicitly-overfitting for known use case(s) instead of generalized/underfitting and having to wait to search through the entire internet to respond)

- modular - because the models only know about these smaller, task-trained dataset(s) the models will hopefully be faster at responding than today’s

- device-native - models are targeted for constrained environments that do not have gpu clusters, excess ram/cpu/storage/connectivity

- open source - since the weights are public domain, the derived intelligence should be public domain

- type of foundational model: weight-derived (blog: https://matlok.ai/ docs: https://bampe-weights.readthedocs.io/en/latest/)

I believe there may be some math/stats proofs that are missing (see the smooth-brain), but I want to push this modular/lego block like approach in hopes of reaching parity with a new generation of foundational models. One of my fundamental assumptions is that if I substantially-reduce the training corpus, a smaller/overfit model will hopefully be faster than a traditionally-trained large language model. The initial, slimmer model building process should also hopefully run on IoT devices and plug-in to existing distributed architectures (device-native).

What are you doing next - Initial use case?

I need help with a good initial use case (please let me know if you have better ones!). Current best idea of the week/last 3 days: I believe this approach and knowledge system of assembling weight-derived models should be shared so we can ensure concepts like an “ethical watermark” for Asimov’s Laws of Robotics are always present in all pre-trained AI model weights using cosine similarity searches. As this approach matures, we should be able to audit and report on what these models know, and I think we need a community-driven project to tackle it.

tl;dr

It’s early days, but I believe we can reuse existing AI tensor weights complemented with smaller “fine-tuning”-sized datasets to build small, high-quality fast generative models.

PoC repository:

https://github.com/matlok-ai/bampe-weights

Inputs

Extracted tensor weight from a GPT2 model.safetensors file:

extracted tensor weight

https://raw.githubusercontent.com/matlok-ai/gen-ai-datasets-for-bampe-weights/main/docs/images/safetensors/gpt2/in/idata__h.0.attn.c_attn.weight.png

Outputs

Predicted weight-derived file for use in a new type of foundational generative AI model

This screenshot is an example of \“trained weights\” for a new type of foundational generative AI model (referred to as a weight-derived model)

https://raw.githubusercontent.com/matlok-ai/gen-ai-datasets-for-bampe-weights/main/docs/images/safetensors/gpt2/out/gpu-generated_predicted-model-weights__layer__h.0.attn.c_attn.weight__chunk__0.png

Thanks for the help, guidance and assistance staying up with the insane speed of this ecosystem!

Reach out if you want more info - my email is in the profile