Reinforcement Learning: A Declarative Paradigm for Game AI

A Retrospective on my work at

Mark Saroufim
7 min readNov 16, 2019

How I started working on Yuri

When I left Microsoft, I had a rough idea that I wanted to work in the gaming space but not exactly what. I had about 2 years to figure something out which I thought was ample time.

I started working on my own video game “1982”. The setting was a personal one for me: The Lebanese Civil War. I became obsessed with understanding the bloody events my parents had to live through and was reading all the military and historical journals I could find.

I wanted to turn the different tactics that the warlords and politicians used and turn them into a Civil War Simulator. Mechanically the game was pretty different than the traditional insurgency setting, you had multiple parties with which you had to deal with militarily on the streets and diplomatically in parliament. The game was a hybrid between turn based strategy games where you select your actions and a real time simulation that would influence how citizens would perceive you.

In retrospect, the scope was hopelessly large but I still learned a ton from it.

One major annoyance I had when I was working on “1982” was that it was really challenging to balance my units. Is 3 health for a unit too much? Is generating $100 per business too little? People often propose play-testing but it was time consuming getting my friends to make the time to play my still un-fun and ugly game. I started playing against myself but it was boring because my game still wasn’t fun and it was hard not to bias one side over the other.

What was the point of Yuri

I thought I’d program some simple AI to play against. But, every time I would make a design change, I would need to go update my AI and over time this was making me furious.

Reinforcement Learning had appealed a lot to me personally because I love robots and I love video games but I had never pursued it professionally.

A single algorithm can learn to master any game better than any human. So if you wanna get really good at a game, you don’t need a coach, you don’t need books, you can just practice against the best.

So the idea of Yuri was to hook a Reinforcement Learning engine to a new game and then train the agent automatically whenever a design change was made.

Which led to a few more ideas:

  1. Automatic Difficulty Adjustment
  2. Dropping in bots that behave like players that just got disconnected
  3. A guided trainer that’s always slightly better than you to become pro

Great! I felt like this could be a business.

As a next step, I needed to find a few large pilot customers to fund my rent while I keep working on cheap self serve solution for indie devs.

But there was a problem

  • Large companies: Wouldn’t return my calls
  • Medium companies: Couldn’t justify spending $10K on server costs for RL training
  • Small companies: Couldn’t pay me for what amounted to time consuming consulting work

However, I still believe there is an un-served niche for Reinforcement Learning in Game Development.

My articles have helped me reach many interesting researchers, potential customers and potential co-founders that share my viewpoints. The benefit of the Reinforcement Learning approach to Game AI is especially apparent when you compare it to the standard solution: Decision Trees.

Yes, Reinforcement Learning is expensive but it’ll save you a lot of time

Reinforcement Learning vs Decision Tree


  • Reinforcement Learning: Declarative, need to spend money on servers
  • Decision Tree: Procedural, need to spend money on programmers and domain experts

But what do I mean by declarative? Reinforcement Learning doesn’t seem to share anything with something like SQL.

The main advantage of Reinforcement Learning is that you can build a superhuman AI that can beat world experts at a certain game without being an expert at that game yourself.

You don’t need to tell a bot HOW to beat a game, you only need to tell it WHAT it means to beat a game.

As a game designer using Reinforcement Learning, you first need to setup a generic RL interface— see my article on Unity ML agents for more detail.

And then all you need to do is specify a reward function which is the goal you’d like your agent to work towards. There’s a lot of flexibility you have in setting those goals but often just setting the goal to win is enough. So let’s say you’re training an agent to play Chess as black then the reward function would be.

The code doesn’t involve any domain knowledge of Chess and can crush the best Chess players easily if properly trained.

Compare this approach to writing a competitive decision tree from scratch where you would need to engineer hundreds of different features and heuristics to maybe get something working

Reinforcement Learning avoids the need for experts and extensive feature engineering.

I was often met with skepticism online about the feasibility of Yuri and the criticism generally fell into a few buckets.

  1. It’s too different from what I’m used to
  2. It’s too expensive
  3. It’s too slow
  4. It’s unreliable
  5. It’s too hard to interpret and debug, how am I supposed to make updates
  6. It removes the craftsmanship, I need to make my AI fun

Let’s go through them 1 by 1

It’s too different from what I’m used to

I agree, switching cost is real and distracting. But the benefits of declarative AI programming is an investment and not just an expense.

It’s too expensive

Yes you’re now paying for storage and compute and it’s expensive but this is an expense that’s on the order of 10,000 dollars.

Compared to hiring a traditional AI engineer to handcraft a decision tree for you will cost at least 100,000 a year. You probably also need to pay to interview a few experts at your game and spend real dev hours to consolidate the feedback.

There’s also important areas of research that are trying really hard to make Reinforcement Learning training cheaper. My money is on Differentiable Physics Engines.

It’s too slow

Compared to the time it would take to design a great decision tree, Reinforcement Learning is fast. Dueling DQNs and Prioritized Experience Relays are the main important research advances from the past of years to make Reinforcement Learning scale.

It’s unreliable/unstable

Here it’s important to separate out the reliability of training vs reliability of inference.

Training: Yes, RL algorithms are sensitive to hyper-parameters but this is being automated away by new research into AutoML and we can expect the research community to publish more tricks here as interest in game simulators and robotics keeps increasing. Practitioners also often use Target Networks to make the training more stable

Inference/Playing: If the end algorithm can beat the best players ever then it’s very reliable even if it the model has some probabilistic elements.

It’s too hard to interpret and debug, how am I supposed to make updates

This is the area where Reinforcement Learning still has the most to improve but I would argue that Decision Trees suffer from the same issues.

Let’s take DOTA as an example of a competitive real time tower defense game where you have 2 teams each with 5 players. Each player can choose among 117 heroes that all play very differently from each other.

Here is an example of a community project to build better AI bots for DOTA since the default ones kind of suck.

There’s a script for each of the 117 heroes, I picked one at random and it was 550 lines long.

Q: Now suppose you add a new hero or item to the game, how should you change your decision trees?

A: In the worst case you need to change every single file because a single minor design change in 1 hero could influence the viability of the remaining 116.

The word interpretable is often loosely thrown around and while you can definitely understand how a small decision tree operates, once you get one with several hundreds of nodes, there’s no way you’re making sense of it. Humans are inherently bad at reasoning about complexity which is why we’re so hooked to computers.

Compare the above approach with running a single game agent that would learn to play any hero where you input the various spells that a hero can take, tell it that it should win and then let self play take care of the rest. If your game changes you can bootstrap a new model from your old model.

That said, Reinforcement Learning will still a few more tools to make it THE Game AI solution

  1. Tools to make tracking and correcting unstable training
  2. Tools to make an agent slightly worse in case it’s too good
  3. Tools to bootstrap training

All the excellent research that DeepMind, OpenAI and MSR have released has been mostly been algorithms since the core value is actually in the infrastructure.

It removes the craftsmanship, I need to make my AI fun

Whether a game is fun or not is decided by its mechanics not by its AI. If anything a superhuman AI would help you find design flaws and figure out strategies that makes others irrelevant.

Reinforcement Learning can be a design tool to help you build better games, it’s not a threat to your game design skills, it’s an expert that’s been trained to be amazingly good at your game even if your game isn’t popular yet.

Next Steps

I hope this article helps you better understand what I was trying to do with my Reinforcement Learning research at and why I still believe that Reinforcement Learning will be a game changer for Strategy Games.

Check out my book The Robot Overlord Manual where I’ve dedicated a couple of chapters on Reinforcement Learning.

While I haven’t personally managed to be massively successful in this space myself. There are many big industry projects that have already validated this direction such as Unity ML agents and Project Malmo.

Reinforcement Learning is still in its early days but I’m betting that it’ll be as popular and profitable as Business Intelligence has been.

And if you wanna just chat about Reinforcement Learning or Games please reach out!


Thank you to Stephane Mourani for carefully proofreading this article.