• 0 Posts
  • 29 Comments
Joined 11 months ago
cake
Cake day: October 26th, 2023

help-circle














  • I’m interested to see how model-based RL could work for reasoning.

    Instead of training a model to predict data and then fine-tuning it with RL to be a chatbot, you use RL as the primary training objective and train the data model as a side effect. This lets your pretraining objective be the actual objective you care about, so your reward function could punish issues like hallucination or prompt injection.

    I haven’t seen any papers using model-based RL for language modeling yet, but it’s starting to work well in more traditional RL domains like game-playing. (dreamerv3, TD-MPC2)



  • This seems pretty sketchy. Lots of angry words, but few details.

    Most of this has nothing to do with sexual abuse, but is rather family drama over their dad’s will. She says that Sam and his lawyer were able to delay or withhold money she was supposed to inherit, but doesn’t really provide details. There’s not enough information here to judge the accuracy of her claims.

    The sexual abuse allegedly happened when she was 4 and he was 13, but she didn’t remember it until some kind of flashback in 2020.

    Technological abuse - {I experienced} Shadowbanning across all platforms except onlyfans and pornhub."

    Sam is certainly well-connected within the tech industry, but I’m doubtful that he could get that many platforms to ban her. Also, her posts seem to be up and visible right now.


  • One key difference is that they are not trained with end-to-end optimization but rather a hand crafted learning rule. This rule has strong inductive biases that work well for small datasets with pre-extracted features, like tabular data.

    Their big disadvantage (and this applies to logical/symbolic approaches in general) is that they don’t work well with raw data. Even easy datasets like CIFAR10. The world is too messy for perfect logical rules; neural networks are able to capture this complexity, but simpler models struggle to.

    statistical

    Note that learning is a fundamentally statistical process, so Tsetlin Machines are also statistics based.



  • All the real datasets we care about are “special” in that they are the output of complex systems. We don’t actually want to model the data; we want to model the underlying system.

    Many of these systems are as computationally as complex as programs, and so can only be perfectly modeled by another program. This means that modeling can be viewed as the process of analyzing the output of a program to create another program that emulates it.

    Given infinite compute, I would brute force search the space of all programs, and find the shortest one that matches the original system for all inputs and outputs. Lacking infinite compute, I would use an optimization algorithm like gradient descent to find an approximate solution.

    You can see the link to Kolmogorov Complexity here, and why modeling is said to be equivalent to compression.