• 36 Posts
  • 157 Comments
Joined 9 months ago
cake
Cake day: July 25th, 2024

help-circle




  • You see this on GitHub already. People publish paper results and manuals, along with a few files, and treat that as if it were open source. And this isn’t limited to LLMs, people with CNN papers or crawlers and other results publish a few files and the results on GitHub as if it were open source. I think this is a clash between current scientific community thinking + Big Tech vs Free Software + Free Culture initiatives.

    Additionally, you can’t expect something Microsoft/Meta touches to remain untainted for long.













  • I think it’s important to come up with other forms of generating synthetic data that doesn’t come from distilling other models. Translating documents, OCRing old documents and using Digital Twins to train visual models come to mind. I’ve never successfully trained any model text-related, but I think the quality of the original text should be critical in how it will perform.


  • Not only the big players extract data from the common citizen, but it also enforces information upon them. AI will make people interact through exchange of knowledge less, and concentrate all the “talk” and information on the hands of few. I think this is a big problem, especially as we near the quantum computation era. How can individuals and smaller organizations possibly compete in AI quality on that scenario? But maybe hardware power won’t be the greatest force in Artificial Intelligence.