Because everything past GPT 3.5 has been pretty unremarkable? Doubt anyone in the world would be able to tell a difference in a blind test between 4.0, 4o, 4.5 and 4.1.
I would absolutely take you on a blind test between 4.0 and 4.5 - the improvement is significant.
And while I do want your money, we can just look at LMArena which does blind testing to arrive at an ELO-based score and shows 4.0 to have a score of 1318 while 4.5 has a 1438 - it's over twice likely to be judged better on an arbitrary prompt, and the difference is more significant on coding and reasoning tasks.
Well word on the street is that the OSS models released this week were Meta-Style benchmaxxed and their real world performance is incredibly underwhelming.
because ?