How to Solve Board Games

  • Go is about depth and spatial understanding
  • Poker is about probabilities and seeing others intent
  • Diplomacy is about negotiation and backstabbing

AlphaZero Architecture

AlphaZero doesn’t use human games as data, it’s trained by playing against itself. AlphaZero’s precursor AlphaGo both used human data and all sorts of custom features which were subsequently removed in AlphaZero. This is a big deal because it means that chess amateurs can build cutting edge chess engines while in the past you needed to pair up programmers with chess grand-masters.

Game Tree

When you annotate a chess game the typical representation is a list, shown to the right in the below screenshot.

  1. Sort all actions by score
  2. Pick the highest ranking action
  3. Repeat until win

Monte Carlo Tree Search

Let’s say we’re in the below board position. Who is winning?

  • n is the number o simulations started at the considered node
  • The square root expression is large when a node hasn’t been explored much — exploration
  • c is a constant to balance between exploration and exploitation — during training we want c to be close to 1 and during testing we want c to be close to 0

Actor Critic Network

AlphaZero uses Reinforcement Learning algorithm to guide the Tree Search process.

  1. Critic: A value function which outputs the value of that action V(a) ∈ [-1,1]

Code Deep Dive:

Unfortunately, I couldn’t find a Julia library which implements AlphaZero for Chess.

  • All the files in main directory can also be safely ignored
  • data/model holds the trained model as an h5 and json file which you would then load to play against the model
  • notebooks/ has an example of how to use chess-alpha-zero


Takes care of instantiating and maintaining the Actor Critic algorithm.


Takes care of the Monte Carlo Tree Search Algorithm

Next Steps

  • Game Changer — if you really enjoy Chess this is a book that analyzes the chess games that AlphaZero played, I’ve analyzed chess games for most of my life and it definitely felt like AlphaZero was playing a new kind of chess. A sort of Tal on steroids.
  • Get AlphaZero working on another not so popular board game that you love. If you also end up programming the board game engine from scratch you’re bound to find a couple of researchers interested in bench-marking their algorithms on your game.
  • AlphaGo.jl — a Julia implementation with very clean code. This is an AlphaGo not AlphaZero implementation which means you’ll see a bit more code regarding how to load human game data and board featurization.
  • Leela-zero — is an open source project attempting to replicate the results that Google got. The compute required to make AlphaZero work was data center big. Leela-zero will show you how to scale everything I talked about to a distributed setting. It’s also notoriously difficult to find good hyper-parameters in Reinforcement Learning so any compute donations would be highly appreciated.
  • Read Deep Learning and the Game of Go — this is my favorite introduction to Machine Learning textbook. It’ll go over all the main components that you need to deploy a fully working Reinforcement Learning system including a Go engine, Go web service that you can play against, a data loader to bootstrap training.
  • DeepStack — is a project that solves simplified variants of Poker. The main idea is to use No Regret learning which is a way of updating your past beliefs based on the new cards that your opponents played. DeepStack also more explicitly uses ideas from Game Theory while AlphaZero does not. I’ve been sitting on a draft blog post on this topic for a couple of months so if you’d really like to read it, please let me know!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store