Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love some of the interpretations there. For example "Fig. 10: Only Sonnet-3.5 can count the squares in a majority of the images.", when that model simply returns "4" for every question and happens to be right.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: