There's a reason for that. Twitter offers an API to get a massive stream of tweets. They used to provide it in various sizes, leading up to what was called the "firehose". Now they only have a streaming API with filtering (but gets a very small, but constant random sample of tweets), and the "firehose", which is all of them -- and you have to pay for it through a third party partnered with Twitter. When you're doing large-scale datamining, the firehose can be invaluable, albeit expensive (Depends, but is anywhere between $300-$6000/month). But bot makers don't always have that kind of scratch, so they will just consume the public API, which is generally good enough.
My bot is listening to this and it is really awesome! I was surprised to learn how many people on Twitter daily post 'i am bored' up to 300,000 per day! With that amount if data it is readily possible to build robust ML systems.
There's a reason for that. Twitter offers an API to get a massive stream of tweets. They used to provide it in various sizes, leading up to what was called the "firehose". Now they only have a streaming API with filtering (but gets a very small, but constant random sample of tweets), and the "firehose", which is all of them -- and you have to pay for it through a third party partnered with Twitter. When you're doing large-scale datamining, the firehose can be invaluable, albeit expensive (Depends, but is anywhere between $300-$6000/month). But bot makers don't always have that kind of scratch, so they will just consume the public API, which is generally good enough.