I have not. My experience has been a few seconds for an 1024x1024 with medium density of text, FWIW. Feel free to try it on a few test images, model is pretty small and fast, but yeah no formal evals on CPU.
clipboard: rn input is treated like any other source, so text gets written to ./textsnaps/clipboard_ocr.txt, and stdout just prints that path. Nothing goes back to the clipboard in this version (stay tuned)
portability: agreed, and it's a small change. textsnap already looks for the checksum manifest next to the script before falling back to the cache, so extending it should be easy. I make a note for next version.
Great question. I'm not familiar with docling-serv but pretty different beasts from what I gathered. Docling is a heavier pipeline (actually uses GPU).textsnap is the opposite: single-file CLI, small VLM running on plain CPU cores, one command, no server. Tradeoff is CPU decode is sequential so it's slower on dense pages, and it OCRs one image rather than doing full layout.
If docling-serve is already meeting your needs it's probably not an upgrade. But it installs in one command, so would love to hear how it stacks up on your images, if you end up trying it.
thanks! yapsnap is audio to text, and textsnap is image to text. Both have been daily use cases for me for a while. And yes, the feedback on yapsnap encouraged me to also release textsnap on github
reply