I expect most BSDs do this. Dragonfly actually *removed* it two years ago: http:...

bluGill · on Oct 11, 2018

I expect openBSD to continue to do this anyway though because there is the possibility that someone can reboot the system to a new OS (presumably designed just for this purpose) and read whatever was in RAM. Of course programs that deal with encryption zero memory before returning it (It is hard to make sure the compiler doesn't optimize this otherwise useless work out), but most other programs that deal with secrets are not so well written and will live sensitive information around.

sigstoat · on Oct 11, 2018

> the possibility that someone can reboot the system to a new OS and read whatever was in RAM

or physically pull the RAM out, keeping it cold with LN2, and stick it into devices designed for reading it all out.

hultner · on Oct 11, 2018

Canned air upside down works reasonably well and is easier to handle

dataflow · on Oct 11, 2018

Why don't they just avoid the cache pollution issues with non-polluting writes? _mm_stream_si128 comes to mind.

Tuna-Fish · on Oct 11, 2018

That's the exact opposite of what you want to do.

The starting point is that there is stale, useless data in ram. Then a usermode program requests an empty page, and usually when they do this they want to immediately use it.(1) Using non-polluting writes, you have to use main memory bandwidth both for clearing the page, and also for bringing the page back to cache immediately afterwards when the program uses it.

Using writes that just allocate new, cleared dirty lines in cache (like the AMD CLZERO), it avoids both the write (which will happen later, when the lines are evicted from cache, probably after the lines have been used by the program), and the read because the lines are now all in the cache.

(1) And on Linux this is trivially true, because Linux only allocates and clears the page when it is first accessed.

dataflow · on Oct 11, 2018

I don't follow. Who is "you" here? The user-mode program, or the kernel-mode zero-page thread (or whatever its name is)? I'm talking about the zero-page thread here, which is zeroing pages in the background long before any thread has requested access. Those threads do not want to evict anything from the cache. This seems exactly what we want.

Tuna-Fish · on Oct 11, 2018

The issue is that zeroing pages in the background is a pessimization, that should not ever be done. The user-mode program that allocates some memory is typically not going to be able to use only writes that allocate new cache lines without reading memory. So to compare the two systems:

Your system: memory is released, kernel clears it in the background, wasting write bandwidth (which might not matter for anything except power if the system was idle at the time), and when the user-mode program starts using it, it will start writing and every new line they write to will trigger a spurious read.

Modern Linux: memory is released, kernel lets it lie, not using any power or bandwidth to do anything to it, until an user-mode program allocates it and touches the page. Then the kernel picks up the page, writes the entire page with 0:es using whatever idiom on that CPU allows it to just allocate the page in cache without reading it from the RAM. This is really fast, faster than a single memory fetch. The user-mode program can then directly use it without having to fetch anything from DRAM.

masklinn · on Oct 11, 2018

> This is really fast, faster than a single memory fetch.

Nit: it's faster than a page fault (so fault + zeroing is pretty much the same as just fault).

According to the recent-ish Latency Numbers, a main memory reference is ~100ns (variable by arch and local v remote DRAM) which is about the same as zeroing a page, at least with respect to the dragonfly numbers I posted above.

agapon · on Oct 11, 2018

FreeBSD has recently removed this as well.

masklinn · on Oct 11, 2018

I had no idea, do you have a link?

joncrane · on Oct 11, 2018

That's neat.