Hacker Newsnew | past | comments | ask | show | jobs | submit | mrkn1's commentslogin

I have not. My experience has been a few seconds for an 1024x1024 with medium density of text, FWIW. Feel free to try it on a few test images, model is pretty small and fast, but yeah no formal evals on CPU.

It should support 109 languages. More info here: https://huggingface.co/PaddlePaddle/PaddleOCR-VL

No, simply, my laptop only has a CPU.

I haven't. But the evals of the underlying model are published here, including on Omnibench. https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

For 100% local CPU fact checking, I made this: https://news.ycombinator.com/item?id=48301003

Why should I trust this without a paper, benchmark or at least a human-written README?

A lot of my queries are summarize/explain/fact check, and these are covered 100% on my CPU locally [0], reducing frontier model reliance

[0] https://news.ycombinator.com/item?id=48301003


The link in your HN is taking me to your list of Show HN posts. I wasn't able to get your github.


just released new version that implements your ideas

Wow, excellent news! This will definitely become a daily driver for me. Thanks!

thank you being thorough

clipboard: rn input is treated like any other source, so text gets written to ./textsnaps/clipboard_ocr.txt, and stdout just prints that path. Nothing goes back to the clipboard in this version (stay tuned)

portability: agreed, and it's a small change. textsnap already looks for the checksum manifest next to the script before falling back to the cache, so extending it should be easy. I make a note for next version.


Great question. I'm not familiar with docling-serv but pretty different beasts from what I gathered. Docling is a heavier pipeline (actually uses GPU).textsnap is the opposite: single-file CLI, small VLM running on plain CPU cores, one command, no server. Tradeoff is CPU decode is sequential so it's slower on dense pages, and it OCRs one image rather than doing full layout.

If docling-serve is already meeting your needs it's probably not an upgrade. But it installs in one command, so would love to hear how it stacks up on your images, if you end up trying it.


thanks! yapsnap is audio to text, and textsnap is image to text. Both have been daily use cases for me for a while. And yes, the feedback on yapsnap encouraged me to also release textsnap on github

Oh.. I didnt even notice it earlier.. you are also the author of yapsnap.. hence the similarity...

I loved the simplicity in both. They both work, without the bloat.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: