Curious on sota on this topic.
Im familiar with CoT and Tree of Thoughts, but those don’t seem to train the model to excel at using the tree, they just rely on the pretrained model already being good at it. The model has no way to improve its tree-use over time.
Is anybody actively training models to be optimized for tree-based decoding?
Q* haven’t you heard?
Great question, curious in the answer myself.
I think it’s pretty cool that just iteratively reusing an LLM without additional training, i.e chaining prompts, improves quality in most of these methods. I see quite a few of these papers (e.g System 2 Attention).
The Promptbreeder paper has some benchmarking of these methods & proposes an interesting evolutionary prompting strategy.
But like you I’ve been looking / waiting for the papers that explore specifically finetuning the model “nodes”, using LoRA perhaps, or with a meta network or hyper network.
Yeah, I also notice there are two types of ways to implement trees being researched.
One is at a sequence / thought level, like tree of thoughts / chain of thoughts, where the model talks to itself in order to find the best solution. The other is at the decoding / token level, where the tree is used to search for the optimal next set of tokens. In principle you could put these both together and have nested trees.
But yeah I think the alpha-go style self-learning is what is really missing here. In principle, even without a tree, nothing stops us from putting an LLM in an environment where it gets positive feedback from rewards (like solving math problems), and then just let it rip.