Curious on sota on this topic.
Im familiar with CoT and Tree of Thoughts, but those don’t seem to train the model to excel at using the tree, they just rely on the pretrained model already being good at it. The model has no way to improve its tree-use over time.
Is anybody actively training models to be optimized for tree-based decoding?
Q* haven’t you heard?