That’s because that’s what LLMs are trained on. Random comments from people on the internet, including troll posts and jokes which the LLM takes as factual most of the times.
Remember when Google trained their AI on reddit comments and it put out incredibly stupid answers like mixing glue in your cheese sauce to make it thicker?
The old saying: “Garbage in, garbage out.” fits extremely well for LLMs. Considering the amount of data being fed to these LLMs it’s almost impossible to sanitize them and the LLMs are nowhere close to being able to discern jokes, trolls or sarcasm.
Oh yea also it came out some researchers used LLMs to post reddit comments for an experiment. So yea, the LLMs are being fed with other LLM content too. It’s pretty much a human-centipede situation.
Do you subscribe to the idea that LLMs will degrade overtime after recycling their own shit for several years like a gif/jpeg rencoded for the umpteenth time
Honestly? Yea. The training data matters, that’s why all these AI companies are looking for data generated by humans. Feeding them with LLM data would most likely end up in nonsensical stuff pretty fast.
That’s because that’s what LLMs are trained on. Random comments from people on the internet, including troll posts and jokes which the LLM takes as factual most of the times.
Remember when Google trained their AI on reddit comments and it put out incredibly stupid answers like mixing glue in your cheese sauce to make it thicker?
https://www.reddit.com/r/LinusTechTips/comments/1czj9rx/google_ai_gives_answers_they_find_on_reddit_with/
Or that one time it suggested that people should eat a small rock every day because it was fed an Onion article?
The old saying: “Garbage in, garbage out.” fits extremely well for LLMs. Considering the amount of data being fed to these LLMs it’s almost impossible to sanitize them and the LLMs are nowhere close to being able to discern jokes, trolls or sarcasm.
Oh yea also it came out some researchers used LLMs to post reddit comments for an experiment. So yea, the LLMs are being fed with other LLM content too. It’s pretty much a human-centipede situation.
But yea, I wouldn’t trust these models for anything but the most simplest of tasks and even there I would be pretty circumspect of what they give me.
Do you subscribe to the idea that LLMs will degrade overtime after recycling their own shit for several years like a gif/jpeg rencoded for the umpteenth time
Honestly? Yea. The training data matters, that’s why all these AI companies are looking for data generated by humans. Feeding them with LLM data would most likely end up in nonsensical stuff pretty fast.