> The weird thing about this is that the only company I've seen doing problemati...

jedberg · on March 25, 2020

> I'm pretty sure that the webmasters responsible for this are using user agent string blocking as a naive attempt to block bots from scraping their site, but that assumes that the bots that they want to block actually send an accurate user agent string the first place.

That is exactly what they are doing, and it works really well.

We blocked user agents with lib in them at reddit for a long time.

Any legit person building a legit bot would know to fake the agent string.

The script kiddies would just go away. It drastically reduced bot traffic when we did that. Obviously some of the malicious bot writers know to fake their agent string too, and we had other mitigations for that.

But sometimes the simplest solutions solve the majority of issues.

adwww · on March 25, 2020

> Any legit person building a legit bot would know to fake the agent string.

What, that's totally backwards. Anyone using a bot to do things that might get blocked by publishers fakes the string, legit purposes should really show who / what they are.

xiongchiamiov · on March 25, 2020

It actually is encouraging people to have useful user agents. By default most people end up with a user agent that's something like "libcurl version foo.bar.baz", which isn't actually a description of who or what they are; given the prevalence of curl, it really just tells you that it's a program that uses http.

jedberg · on March 25, 2020

We only blocked agent strings with "lib" in them. You could change the agent to "WebScraperSupreme.com" and it would have been fine (and in fact some people did do that).

NotSammyHagar · on March 25, 2020

Yes perhaps. But it caused problems for regular users like this fellow. I also have tried various 'download via script' for web pages for offline use. I thought I had a problem on my end, I never realized I could have been getting blocked.

naasking · on March 26, 2020

Hard to argue with the economics of that mitigation though. The abuse:legitimate use ratio is probably pretty high. Getting rid of user agent strings will bring back the scaling problems, as they should probably be addressed directly.