@currentscurrents

currentscurrents@alien.top · 10 months ago

Model based RL is looking a little more stable in the last year. Dreamerv3 and TD-MPC2 claim to be able to train on hundreds of tasks with no per-task hyperparameter tuning, and report smooth loss curves that scale predictably.

Have to wait and see if it pans out though.

currentscurrents@alien.top · 10 months ago

They are all known to be stable, because they have a ground-truth simulator to test with. Stable doesn’t necessarily mean useful, but that wasn’t the point.

The benefit here is that training a neural network on simulator data allows you to generate instead of search. The simulator is very computationally expensive (even compared to a deep neural network) and the search space is large and high-dimensional.

currentscurrents@alien.top · 10 months ago

I have done something similar using gradient descent and this differentiable vector graphics library. It converges much faster than genetic algorithms.

A good initialization, like the other commenter’s voronoi idea, would speed it up considerably.

currentscurrents@alien.top · 10 months ago

Aren’t these “natural attacks” (rotating or blurring the maze, etc) just distribution shifts?

They are in-distribution for us, since we’ve seen objects under all sorts of optical conditions. But they’re out-of-distribution for the RL model, which has only ever seen this exact maze format.

currentscurrents@alien.top · 10 months ago

Word is that AMD support is getting better, but by far everyone is still using NVidia.

currentscurrents@alien.top · 10 months ago

It really doesn’t, because no one has any clue what Q* is or if it’s even real.

currentscurrents@alien.top · 10 months ago

TBH this sub would be a lot better if they banned OpenAI news/rumors.

I don’t even mind the “I’m a noob, why won’t my model train” posts - at least those people have genuine interest in ML and are trying to learn. OpenAI news attracts people who have a more science-fiction idea of AI and are just interested in the hype.

currentscurrents@alien.top · 10 months ago

no ML technique has been shown to do anything more than just mimic statistical aspects of the training set.

Reinforcement learning does far more than mimic.

currentscurrents@alien.top · 10 months ago

IMO interpretability and debugging are inherently related. The more you know about how the network works, the easier it will be to debug it.

currentscurrents@alien.top · 10 months ago

you won’t get published without doing proper evaluation

Idk man, I’ve seen some pretty sketchy papers this year.

currentscurrents@alien.top · 10 months ago

Then you lose the 2D grid structure of the image, which is why you want to use a CNN in the first place.

I think it’s possible to apply many of these optimizations to 2D convs as well though. This group is just more interested in language modeling than images.

currentscurrents@alien.top · 10 months ago

Just built this to try on my CNN and then realized it was only for 1D convolutions. Whoops.

currentscurrents@alien.top · 10 months ago

TL;DR they finetuned LLaMA on a bunch of Chinese scientific papers. As a result, it’s pretty good at answering questions about science. Especially in Chinese.

currentscurrents@alien.top · 10 months ago

Honestly I’m surprised we even got that, and I think we might not have except that other researchers independently figured out synthetic captions around the same time.

currentscurrents@alien.top · 10 months ago

I’m interested to see how model-based RL could work for reasoning.

Instead of training a model to predict data and then fine-tuning it with RL to be a chatbot, you use RL as the primary training objective and train the data model as a side effect. This lets your pretraining objective be the actual objective you care about, so your reward function could punish issues like hallucination or prompt injection.

I haven’t seen any papers using model-based RL for language modeling yet, but it’s starting to work well in more traditional RL domains like game-playing. (dreamerv3, TD-MPC2)

currentscurrents@alien.top · 10 months ago

I do know what BERT is and how RNNs differ from transformers. What buzzwords should I be putting on my resume to get these interviews?

currentscurrents@alien.top · 10 months ago

This seems pretty sketchy. Lots of angry words, but few details.

Most of this has nothing to do with sexual abuse, but is rather family drama over their dad’s will. She says that Sam and his lawyer were able to delay or withhold money she was supposed to inherit, but doesn’t really provide details. There’s not enough information here to judge the accuracy of her claims.

The sexual abuse allegedly happened when she was 4 and he was 13, but she didn’t remember it until some kind of flashback in 2020.

Technological abuse - {I experienced} Shadowbanning across all platforms except onlyfans and pornhub."

Sam is certainly well-connected within the tech industry, but I’m doubtful that he could get that many platforms to ban her. Also, her posts seem to be up and visible right now.

currentscurrents@alien.top · 10 months ago

One key difference is that they are not trained with end-to-end optimization but rather a hand crafted learning rule. This rule has strong inductive biases that work well for small datasets with pre-extracted features, like tabular data.

Their big disadvantage (and this applies to logical/symbolic approaches in general) is that they don’t work well with raw data. Even easy datasets like CIFAR10. The world is too messy for perfect logical rules; neural networks are able to capture this complexity, but simpler models struggle to.

statistical

Note that learning is a fundamentally statistical process, so Tsetlin Machines are also statistics based.

currentscurrents@alien.top · 10 months ago

They definitely can go deeper - with skip connections and normalization you can propagate gradients through any depth of architecture.

Adding more layers isn’t free though, it requires more parameters and thus more compute. There’s an optimal depth-to-width ratio for a given parameter count.

currentscurrents@alien.top · 10 months ago

All the real datasets we care about are “special” in that they are the output of complex systems. We don’t actually want to model the data; we want to model the underlying system.

Many of these systems are as computationally as complex as programs, and so can only be perfectly modeled by another program. This means that modeling can be viewed as the process of analyzing the output of a program to create another program that emulates it.

Given infinite compute, I would brute force search the space of all programs, and find the shortest one that matches the original system for all inputs and outputs. Lacking infinite compute, I would use an optimization algorithm like gradient descent to find an approximate solution.

You can see the link to Kolmogorov Complexity here, and why modeling is said to be equivalent to compression.