> This wouldn’t be a problem if these companies offered an API to access their d...

chatmasta · on Dec 23, 2019

> Sorry, but you're talking about something you have no idea of.

This kind of personal attack seems unnecessary, and by the way, I probably know a lot more than you assume about this particular topic.

Regardless, if you had an API, you could charge for it. It wouldn't cover 100% of the scrapers, but it would cover some.

paranoidrobot · on Dec 23, 2019

My intent was not to attack you, but to respond to your assumptions about the business model, and your comparisons to geoblocking and LinkedIn's anti-crawling stuff.

There were APIs. I won't go into the business relationships that existed over their use.

There were still scrapers, regardless.

Moru · on Dec 23, 2019

We also had problems with scrapers using our data. We contacted lawyers, got advice to plant extra messages to get evidence. They then took care of it and the problems have slowly gone away. Cache now works much better. Not fun when you spend a lot of time and money to accumulate data and someone thinks they should have it for free.

chatmasta · on Dec 23, 2019

If you put the data on the internet, and it’s accessible at a public address, it’s free. If you don’t want people (or robots) to access it, don’t make it public.

You might be interested in the case of LinkedIn vs HiQ [0], which is setting precedent for protecting scraping of public data.

Based on the fact that you “inserted special messages,” it sounds like the people scraping your site may have been republishing the data. That is a separate issue that in some cases can violate copyright. But in that case, it’s not the scraping of the data that is the problem, so much as it is republishing the data outside the bounds of fair use.

I am of the strong belief that if you make your data publicly available to users, you should expect bots to scrape it too. If your infrastructure is setup in a way that makes traffic from those bots expensive, that’s your problem. The solution is not to sue people or send them letters. You can mitigate it with infrastructure changes like aggressive caching, or you can charge for access for everyone, not just bots. IMO, it’s especially wrong if you allow google to scrape your data, but try to stop every other bot from doing the same.

[0] https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-l...

paranoidrobot · on Dec 23, 2019

> You can mitigate it with infrastructure changes like aggressive caching

Rate data has a very limited validity period. Customers get super super pissed (and assume you're scamming them) if when they click through they find out that the hotel/flight/whatever that on the previous page you had said was $200, is now either $250 or sold out. Customers, and the local authorities also tend to get lawyers involved if it happens (in their eyes) too frequently without a good explanation.

It's expensive to get that rate data, because unless you have your own inventory, you have to go out to third party APIs to request that rate for the search parameter tuple which has a specific checkin/checkout dates. When you're searching larger cities - where you might have thousands of hotels - that can be an insanely large number of API calls to return rates.

Most places (including my former employer) don't have a problem issue with scrapers, so long as they didn't abuse the platform to the point that it was causing a ton of extra load. When you have someone who spins up huge numbers of connections at once, that's when we have to do something about it.

> you can charge for access for everyone

That's implicit in the purchase process.

It's like if there's a little cafe that provides free water and tables to sit at on their balcony. That works out for them because it attracts customers. Not everyone might buy something, but most do.

Then someone who runs a dog walking business decides to make that a stop on their walk with 20 dogs. Their dogs eat all the treats, run around the balcony, while the walker sits at the table and drinks the water. Meanwhile, customers are annoyed that there's now 20 barking dogs running around and so they leave.

The business is well within their rights to tell the dog walker to leave and not return without also blocking others who aren't abusing the system.