Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What is a CDN? How do CDNs work? (animeshgaitonde.medium.com)
166 points by animesh371g on Feb 8, 2023 | hide | past | favorite | 60 comments


Does anyone know of more technically in-depth presentations / blog posts about the inner workings of CDN's? I'm thinking of stuff like:

- How do they distribute a huge load between multiple servers in a DC (some kind of level 2/3 load balancing I don't know of probably?)

- How is the cached data distributed between POP's?

- The whole TLS termination thing is probably an interesting aspect as well. The sensitive key material is needed on every node / POP, but you probably don't want all keys just lying around in clear text on disks? Some kind of distributed HSM-thingy?

- Just practically, how do you manage all these machines. Something like Ansible, or something more like Nomad/K8S?

Perhaps the fly.io people could write something about this (if they haven't already :-))


Out of curiosity I checked 2010, Gilbert Held, A Practical Guide to Content Delivery Networks, 2nd ed., CRC Press. However it offers more of a historical overview: it presents general information on things like TCP/IP and the Cache-Control header, but also the Akamai High Definition network and how it uses Flash. Things get old so fast in this area. Probably also outdated, but interesting nonetheless, is 2015, Abhishek Verma et al., Large-scale cluster management at Google with Borg [2] (check chapter 7. Related work for more references). John Wilkes seems to be involved in multiple articles showing some of the Google internals such as 2020, Borg: the Next Generation [3].

[1] https://www.routledge.com/A-Practical-Guide-to-Content-Deliv...

[2] https://research.google/pubs/pub43438 https://www.youtube.com/watch?v=0W49z8hVn0k

[3] https://research.google/pubs/pub49065 https://research.google/people/JohnWilkes


thanks for sharing. this is really helpful.


I work for a CDN

To answer your questions:

The load is distributed by custom, special purpose load balancers. They route each request to the correct server based on information in the request. The reason requests are routed to a particular server instead of a random one is because we want requests for the same content to go to the same set of servers so that we don’t have to cache more copies of content than is necessary, allowing more total content to be cached in a pop. The server that actually serves the content will then use Direct Server Return to bypass the load balancer when returning the content to the client. Since the content returned is a lot larger than the request for the content, the load balancers are able to handle a lot more requests than if all the connection flow had to go through the load balancer.

Cached content can be distributed to each pop in a few ways. The simplest is just for each pop to individually request the content from origin. This means the origin will see one request for each piece of content per POP. The other main way is by having one POP be the gateway. Other pops use that gateway as their origin, so the content is cached once at the gateway and then served to all the other pops. This way, the customer origin only gets a single request per piece of content from the gateway pop.

TLS termination is a challenge because of what you say. The keys are encrypted for distribution, and then only decrypted and loaded into a servers memory if it gets requests for that domain name. It is complicated.

Managing the machines is a big part of the work a CDN does. I work on one of the teams that works on part of that management. Lots of custom software, asset management, config distribution, etc. I would write more but I need to get my daughter to school!



I spent a couple years in CDN & DNS land. Every meaningful CDN I know of is bespoke, but they all take very similar approaches to common problems.

There are good publications and papers from fastly and facebook circa 2012 to 2016 at events like usenix. Akamai had some seminal works in the early to mid 2000s. Oh, and google maglev and I think cloudlfare talk about IP fragmentation and similar problems where you need persistance but dont get a 5 tuple.

In short: DNS and edns client subnet tell the CDNs name servers the clients network location. Map the client subnet and network topology to a nearby POP. Return those addresses in the A record response.

If you dont get edns0 use a weighted result of the resolvers clients.

The A record IPs get the client to the best available POP considering load, latency, bandwidth, cost, customer domains, etc.

At the pop you play network games to get the tcp session to the first layer of CDN hosts. Ill call them “layer 1” or L1. You almost definitely DO NOT run a full proxy with traffic 100% in:out at this layer. Its simply too wasteful. Instead you can use some combination of ARP, ECMP, or “layer 3 switching” to get tcp flows distributed among the available L1 hosts. These hosts will almost always share those initial “virtual” IP addresses amongst them. The L1 hosts will terminate TCP/TLS and probably parse the HTTP request to determine the customer domain, customer cache rules, etc. The L1 howts will have a local “hot” cache of popular objects, lots of duplication between howts to distribute load. Probably a 50-80% hit rate here to immediately return the result to the client.

If the L1 is a miss they will use a consistent hash function to map the object (think URI plus any customer rules) to an L2 host. The L2 is a segmented cache, maximizing unique bytes stored for content in the “middle” of the popularity distribution. Implementation varies here; “L2” could map to a single L1 host which will have that object. It could map to a single L2 host in a dedicated fleet. It could map to another POP or a shared regional cache. Expect another 50-80% hit rate at this L2.

If your L2 doesnt have the object now you need to go back to the origin. You want to do connection pooling here to save the long distance tcp setup. Use more consistent hashing as necessary. Insert object in the L2 cache.

There are optimizations to make at every stage. Check on probabilistic structures, network encapsulation, cumulative distributions, and network mapping/latencies.


Re: TLS you dont require private keys on every L1. You can proxy session setup to smaller & more restricted fleet holding the certs. Pass back the negotiated session. IIRC there are public proposals/examples of this. Youll probably also want to use a session cache and cookies here as well.

On the other side you can extend the “depth” of your cache hierarchy with additional layers, like L3 etc. Think of doing things like having regional caches that your local/edge caches read from. Or store very large media objects in fewer locations with more density. Or centralize all origin requests through 1 or 2 specific locations to minimize the number of requests to the origin. I seem to recall that Akamai made a TON of money on this “net storage” layer historically.

Also forgot to mention previously TCP fast open and precomputed/cached TCP session values like cwnd. Youll want those to avoid slow start & bandwidth probing. Again, lots of optimizations at every step.


> Just practically, how do you manage all these machines. Something like Ansible, or something more like Nomad/K8S?

Most of the CDN providers predate all those things. Most are custom implementations with varying degrees of documentation and features.

> How is the cached data distributed between POP's?

I've seen two implementations and both do roughly the same thing. A request comes in and if it's not in cache it checks a regional POP preconfigured for that region like it was filling from origin. If it's not there the regional server gets it from origin. That fills the cache for both the regional one and the local. One implementation the regional POP was just config. Meaning the regional POP was just a standard POP that also served the requests from other POPs. In another it was something different only serving regional requests.


As someone mentioned, internally they use high performance load balancer like nginx, which provides most of the things such as TLS termination out of the box. How the machines are managed and scaled is dependent on the CDN provider. For eg:- Akamai would be using an altogether different approach than the one employed by Azure CDN.


Nginx handles TLS termination, but OP brought up a good point - how do you distribute private keys to thousands of nginx nodes without compromising them?


Central store (something like Vault) with access keys for accessing the certificates/keys, small cache on the instances themselves (maybe few hours), monitoring with alerts about how often the central store is accessed.


But there must be something before ≈ nginx as well right? Something something BGP to route it to the closest POP? And when you have traffic to a single pop that are multiples of what even the beefiest machine can handle, something before the machine must load balance it I guess?


That’s done by DNS. The DNS server examines your DNS query and sends you back an IP address answer that’s “close” to the requesting IP address.

For example, if you are in country X, then even though there are dozens or hundreds of POP addresses for the name you requested, you’ll be answered with one for country X.


This can also be done with BGP. Anycast BGP for example is way better suited for this use-case.


BGP is an alternative to DNS but it is not better suited, it depends.

DNS gives more control and precision to CDN provider than BGP. BGP might be good enough if you got few POP around the world but with the scale/distribution of Akamai, DNS is better suited.


I think all major CDNs use some form of anycasting as it's pretty essential that you own your IP space. DNS-based load balancing can also be finicky as you'll have to deal with recursive resolvers, so a combination of anycasting and DNS-based load balancing probably works best.


How does DNS give you more control than BGP?

One is an actual routing system that tells routers where to send data, the other is a name translation system with multiple layers of caches outside of your control.

DNS is a layer above as BGP is still used to actually navigate to the listed IP, and any large CDN will own that IP space and announce their own routes anyway.


With BGP (especially anycast) you don't have direct influence where the requests lands. You can steer traffic by techniques like AS prepend, some priorities per BGP session (I'm not an expert in BGP) but eventually it's not CDN that decides where the request will be routed. It's decided by routers of the client's ISP and backbone networks, each taking decision themself (shorted AS path from his point of view + BGP policies set by network operators), not by you. You can't for example split traffic between DCs with specific proportion (like 30% here and 70% there). You can't split by something else than network properties like forward request to the region that has better chance to have this content in the cache.

With DNS and dynamic responses you are directing request to specific DC, even server, almost on every request. It may be dedicated for this traffic type (live stream different than static images etc). Your DNS server can take the hostname ("www.google.com") into consideration - BGP doesn't even know the hostname in the URL. If you wanted to do it with BGP you would need to place specific content to a dedicated /24 subnet and that is impossible considering how many IPv4 addresses are available.

BGP doesn't even consider network latency, current network load. CDN knows load on their machines, on their network link, where given content is placed. The bottleneck may be storage, network or CPU processing, different for different sites and content type. They need to direct traffic on request basis considering this and at least the hostname from the URL. That's why DNS is used first.


> DNS is a layer above as BGP is still used to actually navigate to the listed IP, and any large CDN will own that IP space and announce their own routes anyway.

Yeah, I'm not sure what parent is on about, BGP and DNS are not alternatives to each other, the internet relies on both of them but at different layers. Without BGP packets wouldn't know how to be routed and without DNS they wouldn't know where to be routing to, they are complementary.


Sure, Internet communications relies on both and you cannot swap one with another. This thread is about load balancing and both can be used for load balancing (Basically DNS resolving to Anycast IP vs DNS resolving to Unicast IPs).


What's the difference between an anycast IP and a unicast IP?

Answer: Nothing. They are just IPs.

Using a combination of Anycast and DNS is going to give the best control over steering http traffic. Particularly if you own a few prefixes and can do clever addressing tricks.


Anycast doesn't let you adjust traffic flows gradually to shed load: you end up breaking connections. DNS is more flexible but isn't always honored.


> That’s done by DNS. The DNS server examines your DNS query and sends you back an IP address answer that’s “close” to the requesting IP address.

Is this really true? Last time I was dealing with load-balancing and DNS queries, DNS was simply "round-robin" the replies, giving you back a random record basically of the ones replied. So if you have three A records with different IPs, each query will give you back one of them, but not depending on the location.

Maybe things have changed since I last dealt with it, but the DNS ecosystem doesn't tend to move very fast so I'm doubtful...


The DNS server doesn't have to return the same responses to every query. It can geolocate the address making the request and use that to determine the respone.

https://easydns.com/features/geo-dns/

(Having said that, I tried my toy site on cloudflare free tier from the UK and it gave me San Francisco IPs, so presumably they only do this for large enough customers)


> It can geolocate the address making the request and use that to determine the respone

It can, but it tends to be a premium feature of specific DNS providers, not a global/by-default feature of DNS as efitz seems to be alluding to.

DNSimple supports it for example, but only on their "Professional" plan (and they call it "Regional Records") while others like Gandi don't support it at all.


If CDN providers are dealing with BGP you'd think they would also run their own DNS servers. I think you can do it with BIND views, from memory.


I think that all of Cloudflare's IPs are GeoIP'd to their main office, like how Google's are. You can see the POP you're hitting here: https://<your url>/cdn-cgi/trace under colo. I think that Cloudflare use Anycast instead of geodns though.


BGP Anycast (possibly in combination with DNS depending on geolocation).

Basically you announce the same IP address from multiple locations, and BGP chooses the best* Route to reach that address.

* Note that "best" does not nessecarily mean "fastest", not even "least hops" - BGP has many knobs where the routing decisions can be manipulated, but by default it choses the route where traffic has to traverse the smallest number of "Autonomous Systems" (roughly equivalent to "Organizational Entities").


around a couple of years ago it always bugged me that CDNs took 15mins to several hours to recognise changes, much like DNS. this made them useless for real-time websites

I wonder if things have changed since then


Much like DNS, it depends on the TTL. But yeah, if you're trying to do real-time stuff, it's best not to cache that with out-of-the-box stock CDN products, but rather have your own caching. Then for the rest of the endpoints/assets, you can use a default CDN.

Something some people seem to miss though, is to leverage ETag header properly, so the client and CDN can serve fresh content automatically when it exists, or serve the cached content otherwise. It's not that tricky, but somehow many seems to not even know about it.


One of the reasons Fastly got used by online newspapers was their fast purge. No newspaper wanted to use CDNs for breaking news if purging took 20 minutes to complete. It was one of the advantages Fastly had that allowed them to get customers from other CDNs. According to https://developer.fastly.com/learning/concepts/purging/ they do single-object purges in 150ms globally.


> It was one of the advantages Fastly had that allowed them to get customers from other CDNs.

That’s a neat fact and perhaps an advantage usually reserved for new entrants into an entrenched market. Presumably Fastly benefited, at least in some part, from technological innovation that existing companies could not apply as easily.


Most of the time you change URLs rather than content but it may not apply to real-time scenarios.

Some CDN provides distributed key value stores where you can push data in seconds, you can also deploy your own code in minutes (for real time changes you would push data, not code).

Some providers expose cache purge APIs so you can clear cached data in seconds.

They also allow serving cached content while asynchronously revalidating with the Origin so following users get a fresh version.

With proper configuration/architecture they are a great way to scale your real-time website.


It depends on the TTL set on the cached content in CDN. If the TTL is for several hours, CDN won't show the recent content. Given that the TTL is set appropriately for the content, CDN will behave correctly.


The problem comes down to how quickly configuration data can be pushed to the edge and modern CDNs like Cloudflare and Fastly respond to changes within seconds, or even milliseconds.


It is changing. I work for a CDN on a team focused on speeding up config propagation. The modern expectation is global distribution of new configs in seconds.


And the time that you spent waiting to see the logs. They aggreagate them via attached files in mails sended from the pops.


The CDN jargon became less magical when I realized CloudFlare was just a tuned and managed NGINX-as-a-service.

Edit: didn't even realize the F is Cloudflare has since been lowercased: https://blog.cloudflare.com/end-of-the-road-for-cloudflare-n....


It's more like Cloudflare forked nginx a long time ago, and is meanwhile in the very slow (like, decade-long) process of replacing it entirely.

The Cloudflare Workers Runtime∗, for instance, is built directly around V8; it does not use nginx or any other existing web server stack. Many new features of Cloudflare are in turn built on Workers, and much of the old stack build on nginx is gradually being migrated to Workers. https://workers.dev https://github.com/cloudflare/workerd

In another part of the stack, there is Pingora, another built-from-scratch web server focused on high-performance proxying and caching: https://blog.cloudflare.com/how-we-built-pingora-the-proxy-t...

Even when using nginx, Cloudflare has rewritten or added big chunks of code, such as implementing HTTP/3: https://github.com/cloudflare/quiche And of course there is a ton of business logic written in Lua on top of that nginx base.

Though arguably, Cloudflare's biggest piece of magic is the layer 3 network. It's so magical that people don't even think about it, it just works. Seamlessly balancing traffic across hundreds of locations without even varying IP addresses is, well, not easy.

I could go on... automatic SSL certificate provisioning? DDoS protection? etc. These aren't nginx features.

So while Cloudflare may have gotten started being more-or-less nginx-as-a-service I don't think you can really call it that anymore.

∗ I'm the tech lead for Cloudflare Workers.


From the article all I gathered was that you’re essentially right.

In more general terms:

A distributed reverse proxy with sensible defaults for caching.

Even though it doesn’t sound like much, it’s pretty cool and useful.

One of the things in ops that can be hard and expensive is any kind of high availability. Say you’re restarting your origin server regularly after pushing new code.

Depending on your use case, it might just be enough to have exactly two well provisioned web servers (VM/dedicated) and maybe one DB server. Then put a CDN in front of it and voila, high availability reads.


Much of the ops related to high availability is also abstracted now since Cloud is extensively used and managed solutions are employed instead of having bare metal deployments. The era before cloud was radically different, now we don't hear the word downtime so often.


If you care mostly about reads, the above setup can be operationally simpler and likely more cost effective and agnostic than a full suite of Cloud products.


Yes, depends on the product that is being built and the traffic it is going to server. In case it's not a high traffic website, we can definitely bypass Cloud. In my stint at a startup, we were directly serving the content via Nginx instead of CDN. Our scope was limited to only country and the use case was also simple, just to display a transaction page with few images to the customers.


That is correct. Internally, it's a high performing web server to return the static content at speed. In addition, the websites get additional functionality such as DDOS protection out of the box by leveraging CDNs.


Well, it was that once. It's certainly got a lot more features now.


It's free bandwidth as well. That's a fairly huge part of the deal.


I'm betting they don't just use NGINX, but also specialized hardware.


Cloudflare has an article on introducing specialist hardware to their stack [0]

[0] https://blog.cloudflare.com/asics-at-the-edge/


Not the same type of specialized "CDN Hardware" that was implied in the parent, this is just "big network" customization at the packet forwarding layer.


They don’t use NGINX anymore


With multicast IPs too - that's the other critical component.


No mention of Coral CDN:

https://wiki.opensourceecology.org/wiki/Coral_CDN

https://en.wikipedia.org/wiki/Coral_Content_Distribution_Net...

https://github.com/morganestes/coralcdn

Just a few years ago, you could append .nyud.net to any domain and fetch it through the Coral cache instantly. But it's apparently been quietly swept under the rug.

Caching is such a basic thing to do that I'm concerned that the current crop of CDNs will mostly be used for mass surveillance. I also worry that VPNs are used for similar purposes by spy agencies.

IMHO the static web should have been distributed from the start. It should have been https everywhere. We should have kept cookies instead of trying to wedge in security on the frontend with OAuth, which is like a leaky sieve in comparison. We should have had Subresource Integrity (SRI) and been able to load scripts and other sensitive files from these caches fearlessly.


Could someone explain me as someone relatively clueless in web matters. I've heard an opinion recently that CDNs are not useful any more. Why would that be the case?


Maybe this was mentioned in the context of embedding third-party resources via CDNs? Until recently people would e.g. often include jQuery from a CDN host, and given that many websites use the same jQuery script browsers would have that file in the cache already, making the site load faster as the browser can use the local version of the file instead of loading it again. Recently browsers started partitioning caches on a domain level, i.e. each second-level domain has it's own cache, so even if jQuery is cached for foo.com the browser won't use that cache if bar.com also requests that exact same file. The reason for that is that people started abusing browser caches for tracking.

That said CDNs are still very useful for a range of functionality such as geographic distribution of content, reducing load to origin servers, DDoS protection and many other applications.


CDNs are definitely useful still, maybe more so as more and more of the world has access to the Internet. Ultimately all of everything we do online is transmitted as 1s and 0s over some sort of wire, and the best we can do at moving those 1s and 0s is the speed of light. Since distance is a factor and light only moves so fast, it is better to have your content closer to your users than not to get around the speed of light problem. Likewise, while servers can handle a lot of load these days, they can't handle infinite load, and it's nice to spread that work out amongst many machines - many hands make light work. A CDN solves both of these problems by holding and serving copies of your content around the world, near your consumers and off your server. If you hosted on a server in the US but served to a customer in Europe, you can save having to go across an ocean by using a CDN. As a front end developer, the static bits of a modern website can be almost entirely hosted in a CDN, leaving only much smaller JSON data that has to go back to origin - but there again, only sometimes if you design your content for caching. You can also do clever things like keep bad guys out of your network by letting the CDN take the brunt of the load, and sometimes tossing it altogether so it never nears your origin.


I've heard an opinion recently that CDNs are not useful any more.

In the past devs would use a CDN delivered asset in the hope that it was already in the user's cache from a visit to a different website that uses the same CDN, meaning it wouldn't need to be downloaded again. That meant you got a little speed boost.

Today browsers partition their caches so you can be absolutely sure the user doesn't have any of your website's assets in their cache if they've not visited before. This actually makes CDNs more important rather than less, because you want to deliver everything from the closest possible location to the user. That means you need a CDN.

Essentially the focus of CDNs has shifted away from Content, and now it's all more about Network.


It's more useful than ever. For example, the entire Vercel product is deployed on CDNs for static files and Lambdas for functions.

I'm pretty sure all static hosts use a CDN for all their customers' files as well.


until we figure out how to break the speed of light, having data physically closer to the user will always matter


A CDN is a network of servers that distributes content from an origin server throughout the world by caching content close to where each end user is accessing the internet via a web enabled device. The content they request is first stored on the origin server and is then replicated and stored elsewhere as needed.


This is a copy-pasted definition that has floated around the web but was most likely found on Akamai or Quora.

Note that the original post is a link to an article, not a request for help.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: