Just tried this out on my Windows machine and got this:
warning: couldn't find nvcc (nvidia c compiler) try setting $CUDA_PATH if it's installed
warning: GPU offload not supported on this platform; GPU related options will be ignored
warning: you might need to install xcode (macos) or cuda (windows, linux, etc.) check the output above to see why support wasn't linked
So I can not use my GPU like I can do with the standard llama.cpp… and I don’t want to install anything - I would like to have a portable solution which I can copy on my external SSD…
WizardLM (WizardLM-70b-v1.0.Q8_0 when quality is needed, WizardLM-30B Q5_K_M when speed is needed).