Compilation speed should be part of the benchmarks, as it's critical for WebASM. The best would be to use the Clang optimization setting with the most similar compilation speed to compare runtime speeds.
As an example the article mentions suboptimal register allocation, but without compile time comparision there's no way to know if there really is a simple way to improve WebGL implementation.
It's true that Clang can't compete with the JITs in question on speed. Even at the settings with the most comparable quality of generated code, Clang will be much slower, because its internal design is primarily geared towards maximum optimization potential at high optimization levels, with lower optimization levels as more of an afterthought.
But that's beside the point. The paper isn't suggesting switching to Clang or to the algorithms Clang is using. Rather, it's treating Clang's output as an approximation of 'optimal' code generation for the given C code. There are various reasons to compare it to WebAssembly JITs:
- For one, the paper identifies specific reasons the JIT output is slower, which shows potential areas of improvement they could focus on.
- The comparison also provides a sort of upper bound on how much the JITs theoretically could be improved. It's only a weak upper bound; the upper bound of code quality achievable at the required performance levels is lower and would be more relevant, but there's no way to measure that.
- It also indicates the potential of adding a higher tier to the JIT that optimizes very hot code using slower algorithms.
- And finally, most entertainingly, the benchmarks help answer age-old questions like "can WebAssembly replace native code?" :) Or, at least, "for what applications can WebAssembly replace native code and have acceptable performance?"...
The problem is that the compilation budget is much smaller when it impacts the perceived startup time of the application. For a performance critical application it is much easier to throw resources to offline compilation than for online compilation.
Do typical web assembly implementations cache the result of compilation?
As an example the article mentions suboptimal register allocation, but without compile time comparision there's no way to know if there really is a simple way to improve WebGL implementation.