Learning English is not simply memorizing a billion sample sentences.
The problem is that we want it to learn to string words together for itself, not regurgitate words which already appear in the training set in that order.
This paper attempts to solve the difficult dilemma of detecting how much of the success of an llm is due to rote memorization.
Maybe more importantly: how much parameter space/ training resources are wasted on this?
How is that a problem? The entire point of training is to memorize and generalize the training data.
Learning English is not simply memorizing a billion sample sentences.
The problem is that we want it to learn to string words together for itself, not regurgitate words which already appear in the training set in that order.
This paper attempts to solve the difficult dilemma of detecting how much of the success of an llm is due to rote memorization.
Maybe more importantly: how much parameter space/ training resources are wasted on this?