It’s fun to consider how C devolves into assembler. In my mind, C and its derivatives dissolve into 68K assembler as I’m writing or debugging. Thinking about code this way lets me get a feel for how all the bits all fit together.
It seems like a lost art to think that way. It’s disturbing to me how many candidates couldn’t write Hello World and compile it from the command line.
Everyone should spend some time with godbolt.org or better, the -save-temps compiler flag, to see how changes affect your generated code. Right now. I’ll wait. (Shakes cane at kids)
In the kernel developer world on the other hand, it's still very common to think about how the C code one writes translates into assembly. (Honestly, I think C should not be used much outside developing kernels anyway, and even there it's just legacy, but that's just personal opinion.)
But it's rough, and dangerous. Optimizers do a lot these days, and I really mean a lot. Besides completely mangling your program order, which includes shoving entire blocks of code into places that you might not have guessed, they also do such things as leveraging undefined behavior for optimizations (what the article is partly about), or replacing entire bits of code by function calls. (A compiler might make code out of your memcpy(), and vice versa; the latter can be especially surprising.)
If you care about the assembly representation of your C code (which kernel developers often do), you will spend a lot of time with the "volatile" keyword, compiler barriers, and some obscure "__attribute__"s.
But I agree, even with those caveats in mind, it's a very useful skill to imagine your C code as what it translates to (even if that representation is just a simplified model of what the compiler will actually do).
that is a poor way to handle UB as it introduces bugs (which are UB themselves). If a compiler detects UB, it should flag an error so the source code gets changed. compilers (or any software really) should never be maliciously compliant.
That's not what I mean. The C standard contains some rules that exist for the sole purpose of providing better optimization, and some of these rules give raise to undefined behavior. The compiler leverages undefined behavior by allowing optimizations to not have to care about code that exhibits such undefined behavior.
If compilers did not take advantage of this, then a lot of behavior would not have to be undefined in the first place. Undefined behavior isn't conjured up from a magical place, it was deliberately specified for a reason.
The subject of the linked article, strict aliasing, is a prime example of exactly that: Surprisingly strict rules for aliasing, giving compilers the opportunity to better optimize code that follows these rules, at the risk of breaking code that does not follow the rules in arbitrary and perhaps unintuitive ways.
Now, these particular rules are controversial, and the article acknowledges this:
If you read this document, you may find to your horror an awful lot of C code, probably code you have written, is UB and therefore broken. However just because something is technically UB doesn’t mean compilers will take advantage of that and try to break your code. Most compilers want to compile your code and not try to break it. Given that almost no one understands these rules, compilers give programmers a lot of leeway. A lot of code that technically breaks these rules will in reality never cause a problem, because any compiler crazy enough to assume all code is always in 100% compliance with these rules would essentially be deemed broken by its users. If you are using fwrite to fill out a structure, you are just fine. No reasonable compiler would ever break that code. The issue is not that implementations don’t give users leeway, the issue is that it’s unclear how much leeway is given.
Nevertheless, there are many other rules that are much more readily accepted where similar things are taking place.
>The compiler leverages undefined behavior by allowing optimizations to not have to care about code that exhibits such undefined behavior.
that's pure maliciousness. if the programmer has written code that exhibits undefined behavior, it should be flagged as an error so it can be changed to code that does not exhibit undefined behavior.
programs need to have one unambiguous meaning, and it should be the meaning intended by the programmer. if meanings can be detected as ambiguous or as not what the programmer intended, that should be flagged, not magically swept under the carpet because it's "faster".
The compiler generally cannot know when the program runs into undefined behavior, because of the halting problem. For detecting undefined behavior at runtime, there’s UBSan. It’s good, but it makes things slower.
or to put it another way, divide by zero is undefined behavior. do you think it should be trapped? or just optimized away so the program can get more quickly back to defined behavior...
I kind of share this feeling (I knew 68K assembler before learning C), but having spent ~30 years writing C, publishing some open source software in C, reading comp.lang.c and draft standards, as well as answering many C questions on Stack Overflow back when it was cool, let me tell you: it's not a good model any more (if it ever was). :)
C is specified against an abstract (not virtual) machine, and it matters.
All the talk about how undefined behaviors give the compiler right to shuffle and/or remove code really break the analogy with assembler, where most things become Exactly What You Say.
It seems like a lost art to think that way. It’s disturbing to me how many candidates couldn’t write Hello World and compile it from the command line.
Everyone should spend some time with godbolt.org or better, the -save-temps compiler flag, to see how changes affect your generated code. Right now. I’ll wait. (Shakes cane at kids)