uhlo's comments

uhlo · on April 17, 2024

Okay now add a small model that decides how much effort is needed in each inference step and we are good to go

kolinko · on April 17, 2024

Yes! That would be awesome. Especially since there are ~32*6 independent effort settings for every single token.

I tested the most basic implementation, with a flat effort setting for all the muls, but I bet the results could be pushed even further with such an approach. Or even with just doing some ML to figure out which layer/matrix needs more and which less effort.

mentos · on April 18, 2024

If this is within reach of doing it sounds like the joke might be worth exploring?

uhlo · on April 17, 2024

Great work! One thing: it seems the hugging face link doesn't work... I get a 404

kolinko · on April 18, 2024

It should work now :)