Does OpenAI ToS prohibit generating datasets for open source LLMs?

Divniy@alien.top · 1 year ago

Does OpenAI ToS prohibit generating datasets for open source LLMs?

mcmoose1900@alien.top · 1 year ago

Well its ML land. No one cares! Just drop your paper and move on!

If OpenAI paid their license violation debt over their history, they would probably fold. And they’re above average.

Wise-Paramedic-4536@alien.top · 1 year ago

This is just so messed up! OpenAI trains their models without asking anyone, and then has the nerve to shut down others trying to do the same. Total hypocrisy, right? Ugh, frustrating as heck!

AdamEgrate@alien.top · 1 year ago

Yeah. They want people to believe that if it’s made by a human it is fair use for training models, but it’s it’s made by an AI it’s not.

9wR8xO@alien.top · 1 year ago

This is vague on purpose. Lets assume that someone makes 7B model on pair of GPT-4 using OAI output as training data then it will be valid most likely during some lawsuit. It can run on almost any devices, even phones so it might be used by people with weak PCs instead of GPT-4.

If someone uses OAI output to make 120B model that is worse than GPT-3.5 then it will not apply.

So it all depends on the circumstances. The broader the ToS the better they can protect their business. The most important aspect is most likely that someone uses OAI data to train own model and then use it commercially to make money.

Trollolo80@alien.top · 1 year ago

Eh, fuck them. They use other datas for the creation of their model, one they themselves probably do not have ownership of, fucking greedy assholes

gthing@alien.top · 1 year ago

This is like Elon saying we should all stop training LLM’s while placing an order for 10,000 GPUs.

“It would be great for me if you wouldn’t do this.”

VancityGaming@alien.top · 1 year ago

What if you develop a model for personal use and it gets leaked?

mrdevlar@alien.top · 1 year ago

You scrape everything without asking for consent and then people use your outputs without your consent.

Shocked Pikachu Face

Nathanielmhld@alien.top · 1 year ago

It’s actually pretty permissive, even in the language they use here in the ToS. By the time any model someone builds is actually competitive directly with OpenAI… it’ll be too late for them to stop you. Plus most people aren’t building models to create just a chatGPT clone.

IamFuckinTomato@alien.top · 1 year ago

Does mistral also have something like this in their ToS.

I’m planning on using mistral generated data to fine-tune a model.