Well, it could be argued that the “optimal response” ie the one that sorta minimizes that “energy” is sorted by LLMs on the first iteration. And further iterations aren’t adding any useful information and in fact are countless occasions to veer off the optimal response.
For example if a prompt is: “what is the Statue of Liberty”, the LLMs first output token is going to be “the”, but it kinda already “knows” that the next ones are going to be “statue of liberty”.
So to me LLMs already “choose” a response path from the first token.
Conversely, a LLM that would try and find a minimum energy for the whole response wouldn’t necessarily stop hallucinating. There is nothing in the training of a model that says that “I don’t know” has a lower “energy” than a wrong answer…
For example if a prompt is: “what is the Statue of Liberty”, the LLMs first output token is going to be “the”, but it kinda already “knows” that the next ones are going to be “statue of liberty”.
So to me LLMs already “choose” a response path from the first token.
Conversely, a LLM that would try and find a minimum energy for the whole response wouldn’t necessarily stop hallucinating. There is nothing in the training of a model that says that “I don’t know” has a lower “energy” than a wrong answer…