More

brokensegue · 2026-05-30T17:37:19 1780162639

Then maybe caveat your posts with that

brokensegue · 2026-05-28T14:23:00 1779978180

yeah i really don't like the corpus of statements and it makes me doubt lenz. consider

> “Artificial intelligence will cause widespread job loss among software engineers.”

https://lenz.io/c/ai-software-engineers-job-loss-impact-05e4...

this is a statement about the future. who knows? dataset also includes

> Robots will not replace human teachers in schools in the near future.

or

> Papua New Guinea has very few female members of parliament.

what counts as very few?

> “Taurine supplementation supports mood and emotional health in humans.”

why is this labeled as misleading? i'm not even sure when I'm supposed to use the misleading label

> Anaximander was the first scientist in recorded history.

this is a judgement call as the term scientist didn't exist.

the claims that feel actually solidly answerable seem to have much better LLM performance

kostaj · 2026-05-28T15:21:19 1779981679

Agree that some of the claims are forward-looking. The messiness of the real-world and real-user fact checks. No ground-truth verdicts are provided or used in the study though. It only measures the level of agreement between the selected models, not which one is right on which claim. I.e. none of the claims is actually labelled.

brokensegue · 2026-05-28T15:58:58 1779983938

were you involved in making the study? your bio says you work for them so you should probably indicate that in your comments.

lack of agreement when there is no singular correct answer (or any answer at all) isn't a useful metric

I ran into a lot of these kinds of issues when working on the Citation Needed WMF project (and related extensions). Truth is so often very nuanced.

simonw · 2026-05-28T16:16:38 1779984998

They introduced themselves as the study author here: https://news.ycombinator.com/item?id=48307887#48307899

brokensegue · 2026-05-28T16:25:58 1779985558

ah. I missed that.

brokensegue · 2026-05-21T17:06:49 1779383209

Most jeeps never go off-road

brokensegue · 2026-05-21T12:08:04 1779365284

I think the problem isn't that people don't care. It's that checking is expensive. "Only $15" isn't trivial when there's tons of claims floating around. And even when you do it people return with complaints and you'll have to redo it (see the other comments here).

brokensegue · 2026-05-13T15:52:14 1778687534

jython has been basically unmaintained for quite some time

sigmoid10 · 2026-05-13T16:01:33 1778688093

Well, they never made the jump to Python 3. But shipping 2.7 interpreters in 2024 was quite an achievement on its own. So their users already know this pain. And from my experience in academia, python 2.7 and java 8 will probably be used for another 20 years before the last machine running that stuff burns out.

brokensegue · 2026-04-27T20:40:44 1777322444

Their variable cost is (basically) the number of tokens. They increased that. I don't get how that saves them money

brokensegue · 2026-04-07T13:41:31 1775569291

You can freeze and concentrate a substance without chemically altering it

brokensegue · 2026-04-07T03:37:23 1775533043

none of that is censorship

brokensegue · 2026-03-24T15:06:20 1774364780

I think b/w film has a different grain than color. It isn't identical to grayscale color

brokensegue · 2026-03-23T18:10:42 1774289442

many people do not think the science against them are credible. foods containing those things are staples of many diets.