If you have any experience with writing Asm you'll see how rigid calling convent...

mmozeiko · on Sept 4, 2016

> Incidentally, these 3 examples are also great at showing how compilers can be so very stupid at code generation. Observe that in all 3 cases, the return value in eax after calling foo is written to memory --- then immediately read from memory again, into the exact same register.

That's because code in article is compiled without optimizations. When you enable optimizations the compiler will do the right thing: https://godbolt.org/g/Lc7giO

bjourne · on Sept 4, 2016

You certainly can do it differently but I really doubt you can do it better. :) A common wisdom learned from the "calling convention wars" (there are many more than just cdecl, stdcall and fastcall) were that it just doesn't matter all that much. The same amount of work has to be done and the only thing that changes is if the caller or callee is the one doing it.

For example, if your convention mandates that the callee must preserve RAX-RDX, then it must push/pop those registers if it wants to use them. Which leads to redundant push/pops if the registers aren't in use by the caller. But if it is free to clobber them, then the caller must push/pop them even if the callee doesn't use them, leading to the exact same number of redundant push/pops!

Narishma · on Sept 4, 2016

By "doing it better", I don't believe parent is saying to create another "better" calling convention, but instead to use no convention at all.

userbinator · on Sept 4, 2016

Exactly. What I see from the "calling convention wars" is not that "it just doesn't matter all that much", it's that there is no single optimal convention in all cases. Some functions need to use more registers than others; some arguments are used very early in the function and their values are not needed after that (prefer these in a register), while others may be used later after a bunch of computation that needs many registers (these might be better staying on the stack.) Some instructions like multiply/divide require certain registers (does your function start with a multiply or divide and is one of the arguments the multiplicand/dividend? Use AX, EAX, or EDX:EAX for that one.)

The short examples in the article are illustrative of "used early and not needed afterwards" --- in cdecl/stdcall the caller writes the arguments into memory, only to have the callee immediately read them back again. Ignoring the extra memory accesses, even fastcall isn't optimal in this case --- it uses ECX and EDX when what's really needed is for one of the arguments to be in EAX since it may become the return value. In my "optimised" fastcall above, you can see I had to spend an extra mov instruction just to get the return value in the right place. It would be two instructions (cmp eax, ecx | cmovge eax, ecx) otherwise. All this useless data movement, for what? Just to conform to some arbitrary convention. These may be small things, but they can add up.