That looks like fantastic stuff. I just want to point out the 15ms delay looks similar to 60Hz vsync (16.7ms), if you are updating the screen once per token, maybe that’s causing a sync somehow?
Nah, that's not it, I measure the CPU & GPU work separately, and 15ms happens between the kernel invocations. It also happens when I don't print out text.
Thanks for the idea though! I treat it as the first community contribution :D