Why does this need to be in real-time, if their daily budget is $1000, you can s...

cldellow · on May 6, 2015

The impressions may be sparse, e.g. say you're retargeting CEOs (demographic information you're getting from a DSP) who have visited your website in the last month (via a pixel you drop) who are in New York City (via a geoIP DB).

So, fine, a probabilistic model might work well. And you might decide to bid on 100% of impressions. And you might decide that you have to bid $200 CPM to win -- which you're OK doing, because they're sparse.

And then say that FooConf happens in NYC and your aggressive $200 CPM bid 100% of the time blows out your budget.

Often you can charge the customer you're acting on behalf of your actual spend + X% up to their campaign threshold. So you really want to ensure that you spend as much as possible, without spending too much. Pacing is hard. Google AdWords, for example, only promise to hit your budget +/- 20% over a 1 month period.

hurin · on May 6, 2015

I'm really not seeing what you gain out of running fine-grained control all-the-time here. Even if it were vital for a customer that you hit a budget target exactly you could dynamically change the granularity of control as you got closer. If anything predictive modeling would give you better budget use when you do have the flexibility than granular adjustment would (I don't know much about the area though and just going with your description of the problem here.)

phamilton · on May 6, 2015

A better example is frequency capping. Ever watch something on Hulu and see the same ad 4 times in a twenty-minute commercial? Or even, worse, back to back?

With a real-time data stack you can avoid the duplicated ad a good percent of the time. Better experience for buyers, for publishers, and for users.

jmalicki · on May 6, 2015

Or you could just store that in a cookie.

phamilton · on May 6, 2015

Store what in the cookie? Every ad the user has seen across the entire web, along with how many times they've seen it in the last N minutes/hours/days/weeks?

A cookie won't fit all that data and a more traditional database generally won't work. In memory k/v stores like Redis won't work due to data size (TBs of data). Hbase/Cassandra/etc sort of work with latency in the 5ms range. That's fairly expensive in a 90ms SLA, but you can make it work. It does limit the amount of work you are able to do.

vtuulos · on May 6, 2015

We (Adroll) have been very happy with DynamoDB for use cases like this. Works fine with ~500B keys, while maintaining very low, and consistent, latencies.

nkohari · on May 6, 2015

+1 -- we also used DynamoDB at Adzerk for the same use case.

easytiger · on May 6, 2015

The alst 4 ads

phamilton · on May 6, 2015

Even then we need a fast lookup. 4 ads means 4 advertisers, 4 creative, 4 potentisl clicks, imps, mids/ends (video), conversions/ etc. There are also multiple cookies depending on which exchange the ad came from. Having to map them on the fly requires a fast and large datastore.

hurin · on May 6, 2015

> A better example is frequency capping. Ever watch something on Hulu and see the same ad 4 times in a twenty-minute commercial? Or even, worse, back to back?

Yeah, but when that happens I usually don't think, oh hey they are lacking an optimal in memory distributed database solution.

I think, well... their engineers suck. Or they don't care. Pick one.

edit: His point is vague, so there is nothing technical to respond to. I am very much interested in a good technical example - but the things mentioned so far are by all appearances relatively straight-forward and linear, hence lack of effort or bad engineering are the only reasonable assumptions left.

cwyers · on May 6, 2015

I don't get your point here. He's explaining why it works the way it does. You're saying you don't think about it as a user. That doesn't invalidate or even respond to his point.

phamilton · on May 6, 2015

Volume and latency requirements make it more difficult to track individuals on the web. It's an easier problem to solve in 50ms. It's also much easier to solve when it's only a million individuals rather than a couple hundred million individuals.

Like most problems, scale makes it hard.

hurin · on May 6, 2015

It's simply not a difficult problem: there are no consensus requirements between individuals, so scaling can't be made any harder by increasing N.

cldellow · on May 6, 2015

phamilton touched on another good example - the budget can be implicitly set per-user via a frequency cap. If you see the user and have a chance to bid on an impression for them, the odds are good you'll see them again in two seconds -- winning two auctions means you've overspent by 100%. Oops.

minaguib · on May 6, 2015

Perhaps also worth pointing out that in RTB, quite often you won't know if you've actually won the auction or not for a few seconds later (when the exchange calls you back with a win notification, or you get it directly via a pingback from the user's device)

During that delay you might have actually already processed new bid requests (auctions) for the same user.

Depending on the order's characteristics and how much you're willing to deviate from target - especially when observed within a small time window - the above poses additional challenges w.r.t. overspending.