Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think this is doable. Say we assign a win rate W(S) to each board state S, and let W(S, A) denote the win rate after taking action A from state S. Since the transition is probabilistic, we can write:

W(S, A) = P(good) * (1 - W(S_good)) + P(bad) * (1 - W(S_bad)) + (1 - P(good) - P(bad)) * (1 - W(S))

And obvisouly:

W(S) = max(W(S, A), foreach A in Actions)

max() is troublesome, but we can replace it with a >= sign:

W(S) >= (W(S, A), forall A in Actions)

And then we can expand W(S, A) and move W(S) in each of the inequalities to the left hand side. After all that we will have a bunch of linear inequalities which can be optimized with linear programming. I think the objective would just be:

<del>maximize</del> minimize W(empty_board)



There is an easier way to solve each recursion! I just wrote a blog post on it: https://louisabraham.github.io/articles/probabilistic-tic-ta...


Yep, this is what I ended up doing as well! With how the game generate boards, the player that goes first always have a ~5% advantage. Since players switch hands each around they should have 50% win rate if both play optimally.

In practice, playing against author's AI I barely get ~60% win rate (small caveat, I count ties as 0.5 to both players). What about yours?

Edit: nvm I saw you did the same with ties.


I think you have an error in the equation defining V(s).

You have component n_c * V(s) for the 'nothing happened' case, but I don't think that's correct. If you rolled that nothing happens the turn still passes to your opponent, so I think it should be n_c * V'(s).


oh RIGHT. Gotta fix it


Turns out linear programming is not fast... Takes about 90 minutes to find the optimal solution for any board configuration.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: