This ^, the models are getting larger and we don't appear to have scratched the ...

zarzavat · on June 6, 2024

While vision-vision models are certainly cool, I don’t think that they are as economically valuable as vision-speech or text-text. Humans don’t have vision output.

Computation may be increasing, but that is a statement about the short-term not the long-term.

If we want to predict the future then we care about: how many capabilities can you fit on a phone-sized computer? And I believe that the answer is: a lot.