Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still think it is just a matter of time until scrapers catch up. There are more and more scrapers that spin up an full blown chromium.


It seems inevitable, but in the mean time, that's vastly more expensive than running curl in a loop. In fact, it may be expensive enough that it cuts bot traffic down to a level I no longer care about defending against. Like GoogleBot had been crawling my stuff for years without breaking the site. If every bot were like that, I wouldn't care.


Serious question, in 2026 you can actually have a successful crawler with just curl? I just had to create one for a customer - for their own site - and nothing would have worked without using Chromium.


Probably not for most sites. Example of a site where it'd likely work: a blog made with a static site generator. Example of one where it wouldn't: darn near anything made with React.


It works for the majority of things a text mining scraper would care to scrape. It's not just static sites but also any CMS like wordpress, as well as many JS apps that have server-side rendering. SPA-only sites aren't that common anymore, especially for things like blogs, news and text-based social media.


Cool, if they're running full blown chromium maybe the next step can be mining bitcoin on any pages served to bots.


Even that functions as a sort of proof of work, requiring a commitment of compute resources that is table stakes for individual users but multiplies the cost of making millions of requests.


AFAIK you can bypass it with curl because there's an explicit whitelist for it, no need for a headful browser.


Well it's a race, just like security. And as long as anubis is in the front, all looks bright




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: