More

ykhli · 2026-01-30T02:50:46 1769741446

Thanks so much for the amazing feedback!!! Will update the app to incorporate these

ykhli · 2026-01-27T23:25:22 1769556322

Most amazing tech blog I’ve read this week. What a great read!

ykhli · 2026-01-27T05:54:21 1769493261

my unvalidated theory is that this comes down to the coding model’s training objective: Tetris is fundamentally an optimization problem with delayed rewards. Some models seem to aggressively over-optimize toward near term wins (clearing lines quickly), which looks good early but leads to brittle states and catastrophic failures later. Others appear to learn more stable heuristics like board smoothness, height control, long-term survivability even if that sacrifices short-term score

That difference in objective bias shows up very clearly in Tetris, but is much harder to notice in typical coding benchmarks. Just a theory though based on reviewing game results and logs

ykhli · 2026-01-26T23:22:53 1769469773

oh that is super interesting. ty for the idea!

ykhli · 2026-01-26T21:42:49 1769463769

answered this in a comment above! It's not turn or visual layout based since LLMs are not trained that way. The representation is a JSON structure, but LLMs plug in algorithms and keeps optimizing it as the game state evolves

vunderba · 2026-01-26T22:37:42 1769467062

Thanks for the clarification! Kind of reminds me of the Brian Moore's AI clocks which uses several LLMs to continuously generate HTML/CSS to create an analog clock for comparisons.

https://clocks.brianmoore.com

ykhli · 2026-01-26T23:17:31 1769469451

Wow this is incredible!!

storystarling · 2026-01-27T09:21:01 1769505661

Curious how the token economics compare here to a standard agent loop. It seems like if you're using the LLM as a JIT to optimize the algorithm as the game evolves, the context accumulation would get expensive fast even with Flash pricing.

mhh__ · 2026-01-26T23:22:27 1769469747

I suppose you could argue about whether it's an LLM at that point but vision is a huge part of frontier models now, no?

ykhli · 2026-01-26T21:25:09 1769462709

Thanks for all the questions! More details on how this works:

- Each model starts with an initial optimization function for evaluating Tetris moves.

- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.

- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function

- The model generates updated code, executes it to score all placements, and picks the best move.

- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.

dakom · 2026-01-27T04:24:31 1769487871

How does it deal with latency? Afaict remote LLMs need seconds to process, but Tetris can move much faster..

ykhli · 2025-03-24T16:53:52 1742835232

Super interesting - I'd actually love to write a MCP server based on this as I find myself generate react email from cursor all the time.

Also is there a way to add styles / instruct on colors?

zenorocha · 2025-03-24T17:08:26 1742836106

Having an MCP server for this would be super cool.

You can definitely add styles and instruct on colors.

ykhli · 2025-03-20T19:31:30 1742499090

This MCP server allows Cursor or Claude Desktop to control Philips Hue lights and send messages through them using Morse code. Have fun!

ykhli · 2025-03-20T16:49:18 1742489358

How do MCPs actually work, where are the real use cases, and the challenges today.

mindcrime · 2025-03-20T19:20:24 1742498424

I don't think too many people are going to want to write the essay that would be needed to truly answer that. I would just point you to:

https://modelcontextprotocol.io/introduction

and/or any of the dozens of Youtube video introductions to MPC, to get a good feel for how it works, and how people are using it.

ykhli · on Jan 27, 2025

I benchmarked OpenAI operator and Anthropic's computer use on the human benchmark chimp test (https://humanbenchmark.com/tests/chimp).

Before I began the test, I thought the agents would be much better at this task than most humans -- after all they should have better, more stateful memory than us. The results are intriguing.

Here are the scores from 10 attempts: OpenAI operator: 5, 5, 6, 5, 5, 4, 6, 5, 5, 5 Anthropic computer use agent: 7, 9, 6 (rate limited), 12, 9, 7, 9, 11, 12, 6 (rate limited)