[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models

wojcech@alien.top · 1 year ago

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models

gwern@alien.top · 1 year ago

It’s not surprising at all. The more sample-efficient a model is, the more it can learn a datapoint in a single shot. And that they are often that sample-efficient has been established by tons of previous work.

The value of this work is that it shows that what looked like memorized data from a secret training corpus is memorized data, by checking against an Internet-wide corpus. Otherwise, it’s very hard to tell if it’s simply a confabulation.

People have been posting screenshots of this stuff on Twitter for ages, but it’s usually been impossible to tell if it was real data or just made-up. Similar issues with extracting prompts: you can ‘extract a prompt’ all you like, but is it the actual prompt? Without some detail like the ‘current date’ timestamp always being correct, it’s hard to tell if what you are getting has anything to do with the actual hidden prompts. (In some cases, it obviously didn’t because it was telling the model to do impossible things or describing commands/functionality it didn’t have.)

zalperst@alien.top · 1 year ago

The sample efficiency you mention is an empirical observation, that doesn’t make it not surprising. Why should a single small, noisy, step of gradient descent allow you to immediately memorize the data. I think that’s fundamentally surprising.

gwern@alien.top · 1 year ago

No, I still think it’s not that surprising even taking it as a whole. Humans memorize things all the time after a single look. (Consider, for example, image recognition memory.) If a NN can memorize entire datasets after a few epoches using ‘a single small noisy step of gradient descent over 1-4 million tokens’ on each datapoint once per epoch, why is saying that some of this memorization happens in the first epoch so surprising? (If it’s good enough to memorize given a few steps, then you’re just haggling over the price, and 1 step is well within reason.) And there is usually not that much intrinsic information in any of these samples, so if a LLM has done a good job of learning generalizable representations of things like names or phone numbers, it doesn’t take up much ‘space’ inside the LLM to encode yet another slight variation on a human name. (If the representation is good, a ‘small’ step covers a huge amount of data.)

Plus, you are overegging the description: it’s not like it’s memorizing 100% of the data on sight, nor is the memorization permanent. (The estimates from earlier papers are more like 1% get memorized at the first epoch, and OP estimates they could extract 1GB of text from GPT-3/4, which sounds roughly consistent.) So it’s more like, ‘once every great once in a while, particularly if a datapoint was very recently seen or simple or stereotypical, the model can mostly recall having seen it before’.

zalperst@alien.top · 1 year ago

I appreciate your position, but I don’t think your intuition holds here, for instance biological neural nets very likely use a qualitatively different learning algorithm than back propagation.

zalperst@alien.top · 1 year ago

I appreciate that it’s possible to find a not-illogical explanation (logical would entail a real proof), but it remains surprising to me.

ThirdMover@alien.top · 1 year ago

Humans memorize things all the time after a single look.

I think what’s going on in humans there is a lot more complex than something like a single SGD step updating some weights. Generally if you do memorize something you replay it in your head consciously several times.