I love some of the interpretations there. For example "Fig. 10: Only Sonnet-3.5 ...

		viraptor on July 10, 2024 \| parent \| context \| favorite \| on: Vision language models are blind I love some of the interpretations there. For example "Fig. 10: Only Sonnet-3.5 can count the squares in a majority of the images.", when that model simply returns "4" for every question and happens to be right.