I know how to install and set it up; I tried a couple 70b parameter model… Did not go well. Which would you say is as of now the best uncensored model for my computer right now as of today? I know it changes, but you get what I mean.
If you want a GPU with 12GB VRAM to do most of the work, 70b is way too big. You need to be looking at 13b models. Mythomax, Tiefighter, Causallm (actually 14b), etc. Mistral 7b mashups (Dolphin, OpenHermes, etc.) are decent though.
Koboldcpp is probably the easiest most intuitive Windows option to get up and running with GPU support. Enable CuBLAS, maybe offload 41/43 layers to GPU for 13b models, 35/35 for 7b models. That works best with my RTX 4070 Ti which has the same amount of VRAM.
I usually use 6bit quantized version of 13b models, 8bit quantized versions of 7b models. Maybe try lower bit versions if the slower GPU causes a performance hit (I doubt it will be obvious).
Download models from TheBloke is possible. There’s no need to go handing your email address over to CausalLM and others who ask for it when he does not require it.