More to that - at this point, it feels to me, that arenas are getting too focused on fitting user preferences rather than actual model quality.
In reality I prefer different model, for different things, and quite often it's because model X is tuned to return more of my preference - e.g. Gemini tends to be usually the best in non-english, chatgpt works better for me personally for health questions, ...
In reality I prefer different model, for different things, and quite often it's because model X is tuned to return more of my preference - e.g. Gemini tends to be usually the best in non-english, chatgpt works better for me personally for health questions, ...