[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models

wojcech@alien.top · 1 year ago

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models

DigThatData@alien.top · 1 year ago

it’s possible to “overfit” to a subset of the data. generalization error going up is a symptom of “overfitting” to the entire dataset. memorization is functionally equivalent to locally overfitting, i.e. generalization error going up in a specific neighborhood of the data. you can have a global reduction in generalization error while also having neighborhoods where generalization gets worse.

Hostilis_@alien.top · 1 year ago

Memorization is functionally equivalent to locally overfitting.

Uh, no it is not. Memorization and overfitting are not the same thing. You are certainly capable of memorizing things without degrading your generalization performance (I hope).

seraphius@alien.top · 1 year ago

On most tasks, memorization would be overfitting, but I think one would see that “overfitting” is task/generalization dependent. As long as accurate predictions are being made for new data, it doesn’t matter that it can cough up the old.