Most quantized models on the hub are quantized with GPTQ / AWQ and other techniques. These techniques are optimized for inference and are faster than load_in_4bit. load_in_4bit uses the bitsandbytes library and is more useful for training LoRAs on a limited amount of VRAM.
Hard to say. You’d probably be better off trying a model that’s been fine tuned for use as an assistant. It also helps to add stuff as a system prompt to guide the model, assuming you pick an instruction fine tuned one. Id be surprised if that failed but try not to judge the models too harshly if their views align with an average of the training data. In my (admittedly, limited) experience, none of the models are ‘woke’ as you say. They’re very average. Makes sense given what they were trained on. Perhaps you will find that human bias is a user, and not model, error.