minus-squarekoi691337@alien.topBtoMachine Learning@academy.garden•[R] Orca 2: Teaching Small Language Models How to ReasonlinkfedilinkEnglisharrow-up1·1 year ago Then you could have the language model generate imagined user responses and optimize the reward signal on the imagined user responses Wouldn’t this just constitute to the model sort of overfitting to noise? linkfedilink
Wouldn’t this just constitute to the model sort of overfitting to noise?