Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We are getting into a debate between particulars and universals. To call the 'unified memory' VRAM is quite a generalization. Whatever the case, we can tell from stock prices that whatever this VRAM is, its nothing compared to NVIDIA.

Anyway, we were trying to run a 70B model on a macbook(can't remember which M model) at a fortune 20 company, it never became practical. We were trying to compare strings of character length ~200. It was like 400-ish characters plus a pre-prompt.

I can't imagine this being reasonable on a 1T model, let alone the 400B models of deepseek and LLAMA.



With 32B active parameters, Kimi K2.5 will run faster than your 70B model.


Here's a video of a previous 1T K2 model running using MLX on a a pair of Mac Studios: https://twitter.com/awnihannun/status/1943723599971443134 - performance isn't terrible.


Is there a catch? I was not getting anything like this on a 70B model.

EDIT: oh its a marketing account and the program never finished... who knows the validity.


I don't think Awni should be dismissed as a "marketing account" - they're an engineer at Apple who's been driving the MLX project for a couple of years now, they've earned a lot of respect from me.


Given how secretive Apple is, oh my, its super duper marketing account.


Jeff Geerling and a few others also got access to similarly specced mac clusters. They replicated this performance.

The tooling involved has improved significantly over the past year.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: