Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's an SMT-based solver to make fast "single-lane" SIMD calculations for AVX2 (doing different things in multiple lanes, doing AVX512 or SSE or ARM or general purpose registers, etc. are all on my todo list). Branch free code only. Can handle up to 4 or 5 instruction sequences including things like generating constants like 128-bit tables for PSHUFB (considering PSHUFB as a single-lane operation where every lane looks up a table).


Sounds very cool! I recently wondered about the possibility of superoptimizing vectorized code, so glad to hear about it!

Would you like to chat about opportunities to do analagous work for GPU instruction sets?

I work at a startup making high-performing GPU software and compilers, incidentally including a regex engine! (We can't quite match HyperScan on the CPU, but support capture groups and very high throughput on GPUs.) We also have several other interesting projects, and would like to start a superoptimizer at some point.

P.S. After reading your blog, one of our engineers said: "We see you like PSHUFB. So do we."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: