David S. Barbera

Project · 2026

AlphaZero from scratch

A small and readable implementation of AlphaZero

A small, readable implementation of AlphaZero (Silver et al., Science 2018) built up one stage at a time, plus a polished web UI for actually playing the games. We start with tic-tac-toe because the algorithm is identical across games but tic-tac-toe trains in minutes on a laptop — then we reuse the same code (and the same UI) for harder games.

The big idea in three sentences

  1. One neural network looks at a board and outputs a policy (which moves look good) and a value (who is likely to win).
  2. A Monte Carlo Tree Search uses that network to look ahead and produce a better policy than the raw network.
  3. The network trains on its own search results from self-play, bootstrapping from random play to superhuman with no external data.

← All projects