One person on Discord has called this 'taking the idea of self-consistency forward to ensemble model usage'. I guess this is, technically, what this approach is about :)
Yes, the only issue is the usage of tokens, which is obviously greater as we are sampling more of the solutions space. But it's a compromise to have GPT-4.5 level intelligence with GPT-4.
Probably even higher jump as the models have some amount of unique training data, and they are fact-checking each other, to a more common “truth”, and hallucinations are weeded out.
TL;DR & DIY: asked gpt-4 this prompt "Cluster the top10 categories of complaints by the users, and describe each category with a few adjectives/nouns in order or importance." as of rn.
Crisp or too critical?
1. Documentation: lacking, inadequate, outdated
2. Code quality: simple, awkward, suboptimal
3. Production readiness: experimental, unreliable, limited
4. Monetization: unclear, risky, potentially detrimental to open-source
5. Community support: misinformation, poor communication, fragmented
Very interesting to follow the chain on the console. Vry good in breaking down multi-part questions, way better than Google Assistant - and then uses G to search. Thx for showing the way.
https://big-agi.com/static/kimi-k2.5-less-censored.jpg