I don’t distrust rust, I think it is great for some things. But I write code to enjoy myself and swift hits the spot between safety and satisfaction for me.
The CUDA moat is real for general purpose computing and for researchers that want a swiss army knife, but when it comes to well known deployments, for either training or inference, the amount of stuff that you need from a chip is quite limited.
You do not need most of CUDA, or most of the GPU functionality, so dedicated chips make sense. It was great to see this theory put to the test in the original llama.cpp stack which showed just what you needed, the tiny llama.c that really shows how little was actually needed and more recently how a small team of engineers at Apple put together MLX.
Absolutely agreed on the need for just specific parts of the chip and tailoring to that. My point is bigger than that. Even if you build a specific chip, you still need engineers who understand the full picture.
They do have such a dedicated chip, the MAIA 100 chip which is an in-house chip, and it is a chip that was designed in the era of transformers, and this is what is being discussed in the interview.
I missed that, it’s been a few years since I’ve paid attention to MS hardware and it is very possible that my thoughts are out of date. I left MS with a rather bad taste in my mouth. I’m checking out the info on that chip and what I am seeing is a little light on details. Just TPUs and fast interconnects.
What I’ve found; MIAI 200 the next version is having issues due to brain drain, and MIAI 300 is to be an entirely new architecture so the status for that is rather uncertain.
I think a big reason MS invested so heavily into OpenAI was to have a marquee customer push cultural change through the org, which was a necessary decision. If that eventually yields in a useful chip I will be impressed, I hope it does.
reply