[D] - What is the latest in tree-based approaches for LLMs? Has there been any significant research using RL for this?

30299578815310@alien.top · 1 year ago

[D] - What is the latest in tree-based approaches for LLMs? Has there been any significant research using RL for this?

m98789@alien.top · 1 year ago

Q* haven’t you heard?

residentmouse@alien.top · 1 year ago

Great question, curious in the answer myself.

I think it’s pretty cool that just iteratively reusing an LLM without additional training, i.e chaining prompts, improves quality in most of these methods. I see quite a few of these papers (e.g System 2 Attention).

The Promptbreeder paper has some benchmarking of these methods & proposes an interesting evolutionary prompting strategy.

But like you I’ve been looking / waiting for the papers that explore specifically finetuning the model “nodes”, using LoRA perhaps, or with a meta network or hyper network.

30299578815310@alien.top · 1 year ago

Yeah, I also notice there are two types of ways to implement trees being researched.

One is at a sequence / thought level, like tree of thoughts / chain of thoughts, where the model talks to itself in order to find the best solution. The other is at the decoding / token level, where the tree is used to search for the optimal next set of tokens. In principle you could put these both together and have nested trees.

But yeah I think the alpha-go style self-learning is what is really missing here. In principle, even without a tree, nothing stops us from putting an LLM in an environment where it gets positive feedback from rewards (like solving math problems), and then just let it rip.