Ok, but most of the data is just captions for images. You're going to have to in...

		imtringued on July 10, 2024 \| parent \| context \| favorite \| on: Vision language models are blind Ok, but most of the data is just captions for images. You're going to have to invest some time into building this dataset at your own expense.