How is the stability? A few years ago I could barely run The Great Green Mouse Hotel because of emulation flaws causing the CPU to do weird things and either crash apps or the Windows 98 itself, or the emulator.
Assuming they are independent events, OP is not more nor less likely to win the lottery now that before running in the collision. I actually have more question if you claim the events in question are NOT independent!
What is the difference between this and using normal OCR and then running that output through a LLM? It seems such a bazooka way to kill a fly to me using a modelime Qwen.
For this example, perhaps not much, but a similar query also works. given this image[0] I queried qwen3.5-35b-a3b "where is waldo", and after thinking some got this back:
---
Waldo is located in the *center* of the image.
Here is how to spot him:
1. Look towards the middle of the beach scene.
2. Find the *red and white striped lifeguard stand/booth* on the right side of the center area.
3. Just to the left of that booth, there is a man standing up wearing his signature *red and white striped shirt*, blue jeans, and red hat with a pom-pom. He also has a camera around his neck.
---
Which is not generally something OCR can do. If you instead ask "how many horses are in this image?" you get this response:
---
Based on a careful look at the image, there are *3* equine animals (which appear to be donkeys or ponies used for beach rides).
1. One is near the top left, being ridden by someone in yellow shorts.
2. Right next to it is another one, being ridden by someone in blue and yellow.
3. On the far right edge of the image (near the water), there is a third grey animal being ridden by someone wearing a hat.
--
Now is this all anything you can't do with more boring machine learning? sure, but there's something incredibly convenient about how generic LLMs are. You don't need to train anything, just point the LLM at an image and ask.
For most tasks I agree. However once you've done your OCR you already have lost a lot of positional and context information, so for some tasks it might not be good enough.
If you have scanned PDFs that follow a template, like an invoice from a repeat supplier, then yeah OCR is definitely the way to go.
reply