Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And the smart assembly programmer laughs at the C programmer. There are a couple dozen string-oriented x86 instructions that I've never seen a C or C++ compiler produce. You could easily get a 2x speedup on strings by hand writing clever x86 with SSE. In fact I'm surprised no one has made an STL with lots of inline assembly.


Most of what I know predates SSE; does that have a long track record of being fast? I know REP MOVSB was originally fast, and then CPU vendors decided it was rarely used and did it in (slower) microcode, and then architectural changes made it fast again sometimes depending on alignment.


REP MOVS is still a microcode loop, but it will copy entire cachelines (usually 64 bytes) at once if it can. The fact that it is a tiny instruction (2 bytes) and runs in microcode means that it doesn't consume instruction fetch bandwidth while it's running, and occupies only a tiny amount of the instruction cache.


you'd get worse results, because 1) most of those instructions are slower on modern hardware anyways, and 2) assembly is an optimization barrier on modern compilers, so even if you make one function 0.1% faster, it's not worth it if you've slowed down the rest of the program by 1%.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: