I had the exact same observations and concerns in my projects. For developing a VLM, I have tested to confirm that LoRA/adaptors can lead to significantly better training efficiency and improved robustness as OP suggested. For developing a 3D diffusion model, I found that LoRA has minimal advantages, and so simply fine tune a smaller model can have a better performance (larger batches help significantly in diffusion models).
I had the exact same observations and concerns in my projects. For developing a VLM, I have tested to confirm that LoRA/adaptors can lead to significantly better training efficiency and improved robustness as OP suggested. For developing a 3D diffusion model, I found that LoRA has minimal advantages, and so simply fine tune a smaller model can have a better performance (larger batches help significantly in diffusion models).