The GGUF one has 140 layers, more than what the textgen UI supports (128). So the slowness may be because you are using CPU for some layers (check your terminal output when loading the model). But you can manually change the source code and set the max value of the n_gpu_layers slider to a higher value (just grep for it).
The GGUF one has 140 layers, more than what the textgen UI supports (128). So the slowness may be because you are using CPU for some layers (check your terminal output when loading the model). But you can manually change the source code and set the max value of the n_gpu_layers slider to a higher value (just grep for it).