Hacker Newsnew | past | comments | ask | show | jobs | submit | uhlo's commentslogin

Okay now add a small model that decides how much effort is needed in each inference step and we are good to go


Yes! That would be awesome. Especially since there are ~32*6 independent effort settings for every single token.

I tested the most basic implementation, with a flat effort setting for all the muls, but I bet the results could be pushed even further with such an approach. Or even with just doing some ML to figure out which layer/matrix needs more and which less effort.


If this is within reach of doing it sounds like the joke might be worth exploring?


Great work! One thing: it seems the hugging face link doesn't work... I get a 404


It should work now :)


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: