Project · 2026
AlphaZero from scratch
A small and readable implementation of AlphaZero
A small, readable implementation of AlphaZero (Silver et al., Science 2018) built up one stage at a time, plus a polished web UI for actually playing the games. We start with tic-tac-toe because the algorithm is identical across games but tic-tac-toe trains in minutes on a laptop — then we reuse the same code (and the same UI) for harder games.
The big idea in three sentences
- One neural network looks at a board and outputs a policy (which moves look good) and a value (who is likely to win).
- A Monte Carlo Tree Search uses that network to look ahead and produce a better policy than the raw network.
- The network trains on its own search results from self-play, bootstrapping from random play to superhuman with no external data.