Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

or.. A stopped clock is right twice a day; a mis-prompted LLM is wrong 19 times out of 20—but only because we handed it the wrong instruction sheet.

Procedural error in testing perhaps? I'm not familiar with the methodology for GPQA.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: