I don’t know an alternative, but I did some experimenting with it. I kinda rewrote large parts of it, and I also used a custom build of llama.cpp dll’s. I’m pretty sure it’ll still work with the newest llama.cpp build, you might need to update some native calls if they’ve been expanded or renamed.
My changes are at https://github.com/TheTerrasque/LLamaSharp/tree/feature/clblast - I haven’t really documented it much, but maybe the git history will help
70b? Q4, llama.cpp, some layers on gpu.
Might need to run Linux to get the system ram usage low enough