Agree that some of the claims are forward-looking. The messiness of the real-world and real-user fact checks. No ground-truth verdicts are provided or used in the study though. It only measures the level of agreement between the selected models, not which one is right on which claim. I.e. none of the claims is actually labelled.
I think the problem isn't that people don't care. It's that checking is expensive. "Only $15" isn't trivial when there's tons of claims floating around. And even when you do it people return with complaints and you'll have to redo it (see the other comments here).
Well, they never made the jump to Python 3. But shipping 2.7 interpreters in 2024 was quite an achievement on its own. So their users already know this pain. And from my experience in academia, python 2.7 and java 8 will probably be used for another 20 years before the last machine running that stuff burns out.
reply