Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Summary:

> Root Cause Analysis and Advice for Implementers:

> We conduct a forensic analysis with the aid of performance counter results to identify the root causes of this performance gap. We find the following results: (1) code compiled to WebAssembly yields more loads and stores than native code (2.1× more loads and 2× more stores in Chrome; 1.6× more loads and 1.7× more stores in Firefox). We attribute this to reduced availability of registers, a sub-optimal register allocator, and a failure to effectively exploit a wider range of x86 addressing modes; (2) increased code sizes lead to more instructions being executed and more L1 instruction cache misses; and (3) generated code has more branches due to safety checks for overflow and indirect function calls.

A surprisingly large amount of this boils down to "x86 needs more registers", but there's quite a lot of detail here. It validates my personal experience that Chrome doesn't do a very good job relative to Firefox, for example

> Code generated by Firefox has 1.15× more branch instructions retired and 1.21× more conditional branch instructions retired than native code, while code generated by Chrome has 4.13× branch instructions retired and 5.06× more conditional branch instructions retired.

> Chrome executes 2.9× more instructions and Firefox executes 1.53× more instructions on average than native code.

> On average, Chrome suffers from 3.88× more L1 instruction cache misses than native code, and Firefox suffers from 1.8× more L1 instruction cache misses than native code.

Overall, it was always obvious that Chrome's deficit compared to Firefox was just engineering work, but what is new is figuring out how difficult the native code deficit would be to close. Some aspects are quite straightforward; browsers probably need to develop a more aggressive tier-2 WASM JIT, now that both listed have a tier-1 baseline that is very fast to compile. This should include:

1. a better register allocator,

2. better peephole optimizations,

3. sophisticated loop optimizations.

Unavoidable slowdowns might include:

1. register pressure from reserved registers,

2. stack overflow checks,

3. function table bounds checks.

———

Is this line a typo? I can't make sense of it; I read 1.5 and 1.9 respectively from the table. (There's another mistyped sentence starting ‘Clang,’ but it is not a major issue.)

> On average WebAssembly in Firefox runs at 1.9× over native code and in Chrome runs at 1.75× over native code.



This isn't very surprising. C compilers have developed all kinds of low level optimizations over decades, trying to extract the last bits of performance they could. In contrast, JIT compilers for JS focused on recovering missing type information through type inference and other low-hanging fruits.

It's also the case that there's a lot more pressure on JS JITs to produce code fast, and so running multiple passes of low level optimizations was not desirable in a web context. This is somewhat bad news for WASM: better low level optimizations could mean longer compilation times. However, AFAIK, Chrome has been working on caching compiled code, so you may only have to compile WASM when it changes (or when you reinstall/update your browser).


Yes, a mistake on their part. They changed now to properly show that WASM is slower (not faster) then native code. Now it reads: "applications compiled to WebAssembly run slower by an average of 50% (Firefox) to 89% (Chrome)"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: